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MSTA SEARCH ENGINE 

BACKGROUND OF THE INVENTION 
This application is a conversion of copending 
provisional application 60/062,958, filed October 10, 
1997. 

A number of useful and popular search engines 
attempt to maintain full text indexes of the World Wide 
Web. For example, search engines are available from 
AltaVista, Excite, HotBot, Infoseek, Lycos and Northern 
Light. However, searching the Web can still be a slow 
and tedious process. Limitations of the search services 
have led to the introduction of meta search engines . A 
meta search engine searches the Web by making requests to 
multiple search engines such as AltaVista or Infoseek. 
The primary advantages of current meta search engines are 
the ability to combine the results of multiple search 
engines and the ability to provide a consistent user 
interface for searching these engines . Experimental 
results show that the major search engines index a 
relatively small amount of the Web and that combining the 
results of multiple engines can therefore return many 
documents that would otherwise not be found. 

A iiumber of meta search engines are currently 
available. Some of the most popular ones are 
MetaCrawler, Inference Find, SawySearch, Fusion, 
ProFusion, Highway 61, Mamma, Quarterdeck WebCompass, 
Symantec Internet FastFind, and ForeFront WebSeeker. 



The principle motivation behind the basic text 
meta search capabilities of the meta search engine of 
this invention was the poor precision, limited coverage, 
limited availability, limited user interfaces, and out of 
5 date databases of the major Web search engines. More 

specifically, the diverse nature of the Web and the focus 
of the Web search engines on handling relatively simple 
queries very quickly leads to search results often having 
poor precision. Additionally, the practice of "search 
10 engine spamm-ing" has become popular, whereby users add 

possibly unrelated keywords to their pages in order to 
alter the ranking of their pages. The relevance of a 

0 particular hit is often obvious only after waiting for 
m the page to load and finding the query term(s) in the 
% page . 

+1 Experience with using different search engines 

1 suggests that the coverage of the individual engines was 
' % i relatively low, i.e. searching with a second engine would 
y, often return several documents which were not returned by 
Qo the first engine. It has been suggested that AltaVista 

f* limits the number of pages indexed per domain, and that 

O each search engine has a different strategy for selecting 

u pages to index. Experimental results confirm that the 

coverage of any one search engine is very limited. 
25 In addition, due to search engine and/or 

network difficulties, the engine which responds the 
quickest varies over time. It is possible to add a 
number of features which enhance usability of the search 
engines. Centralized search engine databases are always 
out of dace. There is a time lag between the time when 
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new information is made available and the time that it is 
indexed . 

SUMMARY OF THE INVENTION 

An object of this invention is to improve meta 

search engines. 

Another object of the present invention is to 
provide a meta search engine that analyzes each document 
and displays local context around the query terms. 

A further object of this invention is to 
provide a search method that improves on the efficiency 
of existing search methods. 

A further object of this invention is to 
provide a meta search engine that is capable of 
displaying the context of the query terms, advanced 
duplicate detection, progressive display of results, 
highlighting query terms in the pages when viewed, 
insertion of quick jump links for finding the query terms 
in large pages, dramatically improved precision for 
certain queries by using specific expressive forms, 
improved relevancy ranking, improved clustering, and 
image search. 

These and other objectives are attained with a 
computer implemented meta search engine and search 
method. In "accordance with this method, a query is 
forwarded to a number of third party search engines, and 
the responses from the third party search engines are 
parsed in order to extract information regarding the 
documents matching the query. The full text of the 



documents matching the query are downloaded, and the 
query terms in the documents are located- The text 
surrounding the query terms are extracted, and that text 
is displayed. 

The engine downloads the actual pages 
corresponding to the hits and searches them for the query 
terms . The engine then provides the context in which the 
query terms appear rather than a summary of the page 
(none of the available search engines or meta search 
services currently provide this option) . This typically 
provides a much better indication of the relevance of a 
page than the summaries or abstracts used by other search 
engines, and it often helps to avoid looking at a page 
only to find that it does not contain the required 
information. The context can be particularly helpful 
whenever a search includes terms which may occur in a 
different context to that required. The amount of 
context is specified by the user in terms of the number 
of characters either side of the query terms. Most non- 
alphanumeric characters are filtered from the context in 
order to produce more readable and inf ormative results . 

Results are returned progressively after each 
individual page is downloaded and analyzed, rather than 
after all pages are downloaded. The first result is 
typically displayed faster than the average time for a 
search engine to respond. When multiple pages provide 
the information required, the architecture of the meta 
engine can be helpful because the fastest sites are the 
first ones to be analyzed and displayed. 
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When viewing the full pages corresponding to 
the hits, these pages are filtered to highlight the query 
terms and links are inserted at the top of the page which 
jump to the first occurrence of each query term. Links 
at each occurrence of the query terms jump to the next 
occurrence of the respective term. Query term 
highlighting helps to identify the query terms and page 
re ievance quickly. The links help to find the query 
terms quickly in large documents . 

Pages which are no longer available can be 
identified. These pages are listed at the end of the 
response. Some other meta search services also provide 
"dead link" detection, however the feature is usually 
turned off by default and no results are returned until 
all pages are checked. For the meta search engine of 
this invention however, the feature is intrinsic to the 
architecture of the engine which is able to produce 
results both incrementally and quickly. 

Pages which no longer contain the search terms 
or that do not properly match the query can be 
identified. These pages are listed after pages which 
properly match the query. This can be very important - 
different engines use different relevance techniques, and 
if just one engine returns poor relevance results, this 
can lead to poor results from standard meta search 
techniques . 

The tedious process of requesting additional 
hits can be avoided. The meta search engine understands 
how to extract the URL for requesting the next page of 



hits from the individual search engine responses. More 
advanced detection of duplicate pages is done. Pages are 
considered duplicates if the relevant context strings are 
identical. This allows the detection of a duplicate if 
the page has a different header or footer. 

U.S. Patent 5,659,732 (Kirsch) presents a 
technique for relevance ranking with meta search 
techniques wherein the underlying search engines are 
modified to return extra information such as the number 
of occurrences of each search term in the documents and 
the number of occurrences in the entire database. Such a 
technique is not required for the meta search engine of 
this invention because the actual pages are downloaded 
and analyzed. It is therefore possible to apply a 
uniform ranking measure to documents returned by 
different engines. Currently, the engine displays pages 
in descending order of the number of query terms present 
in the document (if none of the first few pages contain 
all of the query terms, then the engine initially 
displays results which contain the maximum number of 
query terms found in a page so far) . After all pages 
have been downloaded, the engine then relists the pages 
according to a simple relevance measure. 

This measure currently considers the number of 
query terms present in the document, the proximity 
between query terms, and term frequency (the usual 
inverse document frequency may also be useful (Salton, G. 
(198 9) , Automatic text processing: the transformation, 
analysis and retrieval of information by computer, 
Addison-Wesley . ) 



R = (c 2 - ^'l'^i mij Wi,j),c z )^ /Si + K 




where N p is the number of query terms that are present in 
the document (each term is counted only once) , N t is the 
10 total number- of query terms in the document, d(i,j) is 

the minimum distance between the ith and the jth of the 
query terms which are present in the document (currently 
y in terms of the number of characters) , c x is a constant 

m which controls the overall magnitude of the relevance 

% measure R, c 2 is a constant specifying the maximum 

J distance between query terms which is considered useful, 

£ and c 3 is a constant specifying the importance of term 

y frequency (currently c x = 100, c 2 = 5000, and c 3 - lOcJ . 

hh This measure is used for pages containing more than one 

!|p of the query terms; when only one query term is found the 

h term's distance from the start of the page is used. 

O This ranking criterion is particularly useful 

™ with Web searches. A query for multiple terms on the Web 

often returns documents which contain all terms, but the 
25 terms are far apart in the document and may be in 

unrelated sections of the page, e.g. in separate Usenet 
messages archived on a single Web page, or in separate 
bookmarks on a page containing a list of bookmarks. 

The engine does not use the lowest common 
30 denominator in terms of the search syntax. The engine 
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supports all common search formats, including boolean 
syntax. Queries are dynamically modified in order to 
match each individual query syntax. The engine is 
capable of tracking the results of queries, automatically 
informing users when new documents are found which match 
a given query. The engine is capable of tracking the 
text of a given page, automatically informing the user 
when the text changes and which lines have changed. The 
engine includes an advanced clustering technique which 
improves over the clustering done in existing search 
engines. A specific expressive forms search technique 
can dramatically improve precision for certain queries. 
A new query expansion technique can automatically perform 
intelligent query expansion. 

Additional features which could easily be added 
to the meta search engine of this invention include: 
Improved relevance measures, Alternative ordering 
methods, e.g. by site, Field searching e.g. page title, 
Usenet message subject, hyperlink text, Rules and/or 
learning methods for routing queries to specific search 
engines, Word sense disambiguation, and Relevance 
feedback . 

Further benefits and advantages of the 
invention will become apparent from a consideration of 
the following detailed description, given with reference 
to the accompanying drawings, which specify and show 
preferred embodiments of the invention. 



BRIEF DESCRIPTION OF THE DRAWINGS 



Figure 1 shows the home page of the meta search 
engine of this invention. 

Figure 2 shows the options page of the meta 
search engine of this invention. 
5 Figures 3-8 show, respectively, first through 

sixth portions of a sample response of the meta search 
engine of the present invention for the query nec and 
"digital watermark." 

Figure 9 shows a sample page view for the meta 
10 search engine of this invention. 

Figure 10 is a simplified control flow chart of 
the meta search engine of the present invention. 

Figure 11 is a simplified control flow chart 
for image meta search. 
15 Figure 12 shows a first portion of a sample 

response of the meta search engine of this invention for 
the query koala in the image databases, filtered for 
photos . 

Figure 13 shows a second portion of a sample 
20 response of the meta search engine of this invention for 

the query koala in the image databases, filtered for 
photos . 

Figure 14 shows a sample response of the meta 
search engine of this invention for the query koala in 
25 the image databases, filtered for graphics. 

Figure 15 shows clusters for the query "joydeep 

ghosh . " 

Figure 16 shows the first two cluster summaries 
for the query "joydeep ghosh." 
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Figure 17 shows the first part of the clusters 
for the query "joydeep ghosh" from HuskySearch. 

Figure 18 shows the second part of the clusters 
for the query "joydeep ghosh 7 ' from HuskySearch. 

Figure 19 shows clusters for the query "joydeep 
ghosh" from AltaVista. 

Figure 2 0 shows clusters produced by the meta 
search engine of this invention for the query "neural 
network . " 

Figure 21 shows clusters produced by the meta 
search engine of this invention for the query typing and 
injury along with the first cluster summary. 

Figure 22 shows the response of the meta search 
engine of the present invention for the query What does 
NASDAQ stand for? 

Figure 23 shows the response of Infoseek for 
the query What does NASDAQ stand for? 

Figure 24 shows the response of the meta search 
engine of this invention for the query How is a rainbow 
created? 

Figure 25 shows the response of Infoseek for 
the query How is a rainbow created? 

Figure 26 shows the response of the meta search 
engine of the present invention for the query What is a 

mealy machine? 

Figure 27 shows a sample home page showing new 
hits for a query and recently modified URLs . 

Figure 2 8 shows a sample page view showing the 
text which has been added to the page since the last time 
it was viewed. 
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Figure 29 shows the coverage of each of six 
search engines with respect to the combined coverage of 
all six. 

Figure 3 0 shows the coverage as the number of 
5 search engines is increased. 

Figure 31 shows a comparison of the overlap 
between search engines to the number of documents 
returned from all six engines combined. 

Figure 32 shows the coverage of each search 
10 engine with respect to the estimated size of the 

indexable Web. 

Figures 33 and 34 show histograms of the major 
5 search engine response times, and a histogram of the 

m response time for the first response when queries are 

f| made to the six engines simultaneously. 

% Figure 35 shows the median time for the Web 

Hp search engines to respond. 

Figure 3 6 shows the median time for the first 
U of n Web search engines to respond. 

% Figure 3 7 shows the response time for arbitrary 

O Web pages . 

O Figure 3 8 shows the median time to download the 

first of n pages requested simultaneously. 

Figure 3 9 shows the time for the meta engine to 
25 display the first result. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

One of the fundamental features of the meta 
search engine of this invention is that it analyzes each 
30 document and displays local context around the query 
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terms. The benefit of displaying the local context, 
rather than an abstract or summary of the document, is 
that the user may be able to more readily determine if 
the document answers his or her specific query. In 
essence, this technique admits that the computer may not 
be able to accurately determine the relevance of a 
particular document, and in lieu of this ability, formats 
the information in the best way for the user to quickly 
determine relevance. A user can therefore find documents 
of high relevance by quickly scanning the local context 
of the query terms. This technique is simple, but can be 
very effective, especially in the case of Web search 
where the database is very large, diverse, and poorly 
organized. 

The idea of querying and collating results from 
multiple databases is not new. Companies like PLS, 
Lexis-Nexis, and Verity have long since created systems 
which integrate the results of multiple heterogeneous 
databases. Many other Web meta search services exist 
such as the popular and useful MetaCrawler service. 
Services similar to MetaCrawler include SawySearch, 
Inference Find, Fusion, ProFusion, Highway 61, Mamma, 
Quarterdeck WebCompass, Metabot, Symantec Internet 
FastFind, and WebSeeker. 

Figure I shows the home page of the meta search 
engine of this invention. The bar 12 at the top contains 
links for the options page, the help page, and the 
submission of suggestions and problems. Queries are 
entered into the "Find:" box 14. The selection of which 
search engines to use for the query is made by clicking 
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on the appropriate selection on the following line. The 
options are currently: 

1. Web - standard Web search engines: (a) 
AltaVista, (b) Excite, (c) Infoseek, (d) HotBot, (e) 

5 Lycos, (f) Northern Light, (g) WebCrawler, and (h) Yahoo. 

2 . Usenet Databases - indexes of Usenet 
newsgroups: (a) AltaVista, (b) DejaNews, (c) 
Reference . com. 

3 . Press - indexes of press articles and news 
10 wires: (a) Infoseek NewsWire, Industry, and Premier 

sources - c/o Infoseek - Reuters, PR NewsWire etc., and 
(b) NewsTracker - c/o Excite - online newspapers and 
y magazines . 

% 4. Images - image indexes: (a) Corel - corel 

If 5 image database, (b) HotBot - HotBot images, (c) Lycos - 

, ; t Lycos images, (d) WebSeer - WebSeer images, (e) Yahoo - 

J Yahoo images, and (f) AltaVista - AltaVista images. 

^ 5. Journals - academic journals: (a) Science. 

L 6. Tech - technical news: (a) TechWeb and (b) 

j€0 ZDNet . 

C 7. All - all of the above. 

O The constraints menu 16 follows which contains 

u options for constraining the results to specific domains, 

specific page ages, and specific image types. The main 
25 options menu 20 follows which contains options for 

selecting the maximum number of results, the amount of 
context to display around the query terms (in 
characters) , and whether or not to activate clustering or 
tracking. 
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The options link on the top bar allows setting 
a number of other options, as shown at 22 in figure 2. 
These options are: 1. The timeout (per individual page 
download) , 2 . Whether or not to filter the pages when 
viewed, 3. Whether or not to filter images from the pages 
when viewed, 4. Whether each search displays results in a 
new window or not, and 5. Whether or not to perform image 
classification (for manual classification of images) . 
Additionally, the options page shows at 24 and 26 which 
queries and URLs are being tracked for changes, and 
allows entering a new URL to track. 

Figures 3 to 8 show a sample response of the 
meta search engine of this invention for the query nec 
and "digital watermark" . Figure 3 shows the top portion 
of the response from the search. The search form can be 
seen at the top, followed by a tip 3 0 which may be query 
sensitive. Results which contain all of the query terms 
are then displayed as they are retrieved and analyzed (as 
mentioned before, if none of the first few pages contain 
all of the query terms then the engine initially displays 
results which contain the maximum number of query terms 
found in a page so far) . The bars 32 to the left of the 
document titles indicate how close the query terms are in 
the documents - longer bars indicate that the query terms 
are closer together. The engine which found the 
document, the age of the document, the size of the 
document, and the URL follow the document title. 

After the pages have been retrieved, the engine 
then displays the top 20 pages ranked using term 
proximity information (figure 4) . In descending order, 
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and referring to figures 5 to 8, the engine then displays 
those pages which contain fewer query terms, those pages 
which contain none of the query terms, those pages which 
contain duplicate context strings, and those pages which 
5 could not be downloaded. Links to the search engine 

pages which were used are then provided, followed by 
terms which may be useful for query expansion. With 
reference to figure 8, the engine then displays a summary 
box with information on the number of documents found 
10 from each individual engine, the number retrieved and 

processed, and the number of duplicates. 

Figure 9 shows a sample of how the individual 
O pages are processed when viewed. The links 4 0 at the top 

5l jump to the first occurrence of the query terms in the 

% document, and indicate the number of occurrences. The 

% [Track Page] link activates tracking for this page - the 

jp user will be informed when and how the document changes. 

~~ 4 The engine comprises two main logical parts: 

U the meta search code and a parallel page retrieval 

jS) daemon. Pseudocode for (a simplified version of) the 

rj search code is as follows: 

O Process the request to check syntax and create.. 

w ..regular expressions which are used to match query.. 

. . terms 

25 Send requests (modified appropriately) to all.. 

..relevant search engines 
Loop for each page retrieved until maximum number.. 

..of results or all pages retrieved 
If page is from a search engine 



-15- 



Parse search engine response extracting hits.. 

..and any link for the next set of results 
Send requests for all of the hits 
Send requests for the next set of results . . 
. . if applicable 

Else 

Check page for query terms and create . . 
..context strings if found 

Print page information and context strings if all.. 

..query terms are found and duplicate context.. 

. . strings have not been encountered before 
Endif 
End loop 

Re-rank pages using proximity and term frequency. . 
. . inf ormation 

Print page information and context strings for pages.. 

. .which contained some but not all query terms 
Print page information for pages which contained no.. 

. .query terms 

Print page information and context strings for pages . . 

. .which contain duplicate context strings 
Print page information for pages which could not be.. 

. . downloaded 
Print summary statistics. 

Figure 10 shows a simplified control flow 
diagram 50 of the meta search engine. The page retrieval 
engine is relatively simple but does incorporate features 
such as queuing requests and balancing the load from 
multiple search processes, and delaying requests to the 
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same site to prevent overloading a site. The page 
retrieval engine comprises a dispatch daemon and a number 
of client retrieval processes. Pseudocode for (a 
simplified version of) the dispatch daemon is as follows: 
Start clients 
Loop 

Check for timeout of active clients 

Send any queued requests if possible, balancing.. 
..load for requests from multiple search., 
.processes 
If there is a message from a client 

If message is "replace me" replace the.. 

. .client with a new process 
If message is "done" update client.. 

. . information 
If message is "status" return status 
If message is "get" then 

If all clients are busy or a request.. 
..has been made to this site.. 
..within the last x seconds then.. 
..queue the request 
Otherwise send request to a client 
Endif 

Endif 

End loop 

The client processes simply retrieve the 
re l evan t pages, handling errors and timeouts, and return 
the pages directly to the appropriate search process. 
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The algorithm used for image meta search in the 
meta search engine of this invention is as follows: 
Process the request to check syntax and create.. 

..regular expressions which are used to match.. 
5 . . query terms 

Send requests (modified appropriately) to all.. 

..relevant image search engines 
Loop for each page retrieved until maximum number of.. 
..images or all pages retrieved 
10 If page -is from a search engine 

Parse search engine response extracting. . 
. .hits and any link for the next set. . 
3 . .of results 

S Send requests for all of the hits 

if 5 Send request for the next set of results.. 

C ..if applicable 

C Else if page is an image 

^ Add image to the display queue 

^ Else 

i0 Analyze query term locations in the page.. 

Z ..and predict which (if any) of the.. 

□ . . images on the page corresponds to . . 

. .the query - send a request to. . 

..download this image 

25 Endif 

If'n images are in the display queue 

Create a single image montage of the.. 

. . images in the queue 
Display the montage as a clickable image.. 
30 ..where each portion of the image.. 
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. . corresponding to the original . . 
..individual images shows a detail.. 
. .page for the original image 

Endif 

End loop 

If any images are in the display queue 

Create a single image montage of the images in. . 
. .the queue 

Display -the montage as a clickable image where.. 

..each portion of the image corresponding.. 
..to the original individual images shows.. 

..a detail page for the original image 

Endif 

Print summary statistics 

Figure 11 shows a simplified control flow 
diagram 6 0 for the image meta search algorithm. 

Image Classification 

The Web image search engine WebSeer attempts t 
classify images as photographs or graphics. WebSeer 
extracts a number of features from the images and uses 
decision trees for classification. We have implemented 
similar image classification system. However, we use a 
different feature set and use a neural network for 
classification. Figures 12 and 13 show the response of 
the meta search engine of this invention to the image 
query koala, with the images filtered for photos. Figur 
14 shows the response when filtering for graphics. 
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Document Clustering 

Document clustering methods typically produce 
non-overlapping clusters. For example, Hierarchical 
Agglomerative Clustering (HAC) algorithms, which are the 
most commonly used algorithms for document clustering 
(Willet, P. (1988), "Recent trends in hierarchical 
document clustering: a critical review', Information 
Processing and Management 24, 577-597), start with each 
document in a cluster and iteratively merge clusters 
until a halting criterion is met. HAC algorithms employ 
similarity functions (between documents and between sets 

of documents) . 

A document clustering algorithm is disclosed 
herein which is based on the identification of 
co-occurring phrases and conjunctions of phrases. The 
algorithm is fundamentally different to commonly used 
methods in that the clusters may be overlapping, and are 
intended to identify common items or themes. 

The World Wide Web (the Web) is large, contains 
a lot of redundancy, and a relatively low signal to noise 
ratio. These factors make finding information on the Web 
difficult. The clustering algorithm presented here is 
designed as an aid to information discovery, i.e. out of 
the many hits returned for a given query, what topics are 
covered? This allows a user to refine their query in 
order to investigate one of the subtopics. 

The clustering algorithm is as follows: 

Retrieve pages corresponding to the query 

For each page 

For n = i to MaximumPhraseLength 



For each set of successive n words 

If this combination of words has not 
. .already appeared in this. . 
..document then add the set to a 
..hash table for this document.. 
..and a hash table for all.. 
. .documents 

End for 
End for 
End for 

For n = MaximumPhraseLength to 1 

Find the most common phrases of length n, to a. . 
. .maximum of MaxN phrases, which occurred. . 
. .more than MirUST times 

Add these phrases to the set of clusters 
End for 

Find the most common combinations of two clusters.. 

..from the previous step, to a maximum of MaxC. . 

..combinations, for which the combination.. 
..occurred in individual documents at least.. 
. .MinC times 

Delete clusters which are identified by phrases.. 

..which are subsets of a phrase identifying.. 
..another cluster 

Merge clusters which contain identical documents 
Display each" cluster along with context from a set.. 

. . of pages for both the query terms . . 

..and the cluster terms. 
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Figure 15 shows the clusters 70 produced by 
this algorithm for the query "joydeep ghosh" , and figure 
16 shows the first two cluster summaries 72 and 74 for 
these clusters. Figures 17 and 18 show the clusters 76 
and 8 0 produced by HuskySearch for the same query. 
Figure 19 shows the clusters 82 produced by AltaVista. 
Figures 2 0 and 21 show the clusters 84 and 86 produced by 
the meta search engine of this invention for another two 
queries: "neural network" and typing and injury. 



Query Expansion 

O One method of performing query expansion is to 

m augment the query with morphological variants of query 

%5 terms. Word stemming (Porter, M.F. (1980), "an algorithm 

S for suffix stripping', Program 14, 130-137.) can be used 

in order to treat morphological variants of a word as 
identical words. Web search engines typically do not 
M= perform word stemming, despite the fact that it would 

CJo reduce the resources required to index the Web. One 

fj reason for the lack of word stemming by Web search 

0 engines is that stemming can reduce precision. Stemming 

w considers all morphological variants. Query expansion 

using all morphological variants often results in reduced 
25 precision for Web search because the morphological 

variants often refer to a different concept. Reduced 
precision using word stemming is typically more 
problematic on the Web as compared to traditional 
information retrieval test collections, because the Web 
30 database is larger and more diverse. A query expansion 
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algorithm is disclosed herein which is based on the use 
of only a subset of morphological variants. 
Specifically, the algorithm uses the subset of 
morphological variants which occur on a certain 
percentage of the Web pages matching the original query. 
Currently, the query terms are stemmed with the Porter 
stemmer (Porter, M.F. (1980), "An algorithm for suffix 
stripping", Program 14, 130-137.) and the retrieved pages 
can be searched for morphological variants of the query 
terms. Variants which occur on greater than 1% of the 
pages are displayed to the user for possible inclusion in 
a subsequent query. No quantitive evaluation of this 
technique has been performed, however observation 
indicates that useful terms are suggested. As an 
example, for the query nec and "digital watermark", the 
following terms are suggested for query expansion: 
digitally, watermarking, watermarks, watermarked. 

Currently the technique does not automatically 
expand a query when first entered, because the query 
expansion terms are not known until the query is 
complete. However the technique can be made automatic by 
maintaining a database of expansion terms for each query 
term. The first query containing a term can add the co- 
occurring morphological variants to the database, and 
subsequent queries can use these terms, and update the 
database if required. 

Specific Expressive Forms 

Accurate information retrieval is difficult due 
to the possibility of information represented in many 



ways - requiring an optimal retrieval system to 
incorporate semantics and understand natural language. 
Research in information retrieval often considers 
techniques aimed at improving recall, e.g. word stemming 
5 and query expansion. As mentioned earlier, it is 

possible for these techniques to decrease precision, 
especially in a database as diverse as the Web. 

The World Wide Web contains a lot of 
redundancy. Information is often contained multiple 
10 times and expressed in different forms across the Web. 

In the limit where all information is expressed in all 
possible ways, high precision information retrieval would 
O be simple and would not require semantic knowledge - one 

m would only need to search for one particular way of 

qpg expressing the information. While such a goal will never 

+" be reached for all information, experiments indicate that 

jE the Web is already sufficient for an approach based on 

^ this idea to be effective for certain retrieval tasks. 

: U The method of this invention is to transform 

queries in the form of a question, into specific forms 
for expressing the answer. For example, the query "What 
does NASDAQ stand for?" is transformed into the query 
"NASDAQ stands for" "NASDAQ is an abbreviation" "NASDAQ 
means" . Clearly the information may be contained in a 
different form to these three possibilities, however if 
the information does exist in one of these forms, then 
there is a high likelihood that finding these phrases 
will provide the answer to the query. The technique thus 
trades recall for precision. The meta search engine of 
30 this invention currently uses the specific expressive 
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forms (SEF) technique for the following queries (square 
brackets indicate alternatives and parentheses indicate 
optional terms or alternatives) : 

• What [is | are] x? 

5 • What [causes | creates (produces] x? 

• What do you think [about j of | regarding] x? 

• What does x [stand for] mean]? 

• Where is x? 

• Who is x? 

10 •-[Why | how] [is | are] (a|the) x y? 

• Why do x? 

• When is x? 
O • When do x? 

m • How [do | can] i x? 

B}5 • How (can) [a|the] x y? 

% • How does [a | the] x y? 

J: As an example of the transformations, "What 

^ does x [stand for | mean]?" is converted to "x stands for" 

U "x is an abbreviation" "x means" , and "What 

H>0 [causes | creates (produces] x?" is transformed to tt x is 

IT caused" "x is created" "causes x" "produces x" "makes x" 

O "creates x" 

u Different search engines use different stop 

words and relevance measures, and this tends to result in 
25 some engines returning many pages not containing the 

SEFs. The offending phrases are therefore filtered out 
from the queries for the relevant engines. 

Figure 22 shows at 90 the response of the meta 
search engine of this invention for the query "What does 
30 NASDAQ stand for?" The answer to the query is contained 
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in the local context displayed for about 5 out of the 
first 6 pages. Figure 23 shows at 92 the response of 
Infoseek to the same query. The answer to the query is 
not displayed in the page summaries, and which, if any, 
of the pages contains the answer is not clear. Figures 
24 and 25 show, at 94 and 96, the meta search engine of 
this invention and Infoseek responses to the query "How 
is a rainbow created? " Again, the answer is contained in 
the local context shown by the meta search engine of this 
invention but: it is not clear which, if any, of the pages 
listed by Infoseek contain the answer to the question. 
Figure 2 6 shows at 100 a third example of the response 
from the meta search engine of the invention for the 
query "What is a mealy machine?" 

It is reasonable to expect that the amount of 
easily accessible information will increase over time, 
and therefore that the viability of the specific 
expressive forms technique will improve over time. An 
extension of the above-discussed procedures is to define 
an order over the various SEFs, e.g. "x stands for" may 
be more likely to find the answer to "What does x stand 
for" than the phrase "x means" . If none of the SEFs are 
found then the engine could fall back to a standard 
query. 

Search tips may be provided by the meta engine. 
These tips may include, for example, the following: 

• Use quotes for phrases, e.g. "nec research". 

• You can hide the various options above to 
save screen space by clicking on the "hide" links. 



• Window option: clicking on a hit brings up 
the page in the same window for multiple searches or a 
new window for each new search. 

• Filter option: filters pages when viewed to 
highlight query terms. Faster due to local caching of 
the page. 

• The letter (s) after the page titles identify 
the search engine which provided the result (e.g. 

A==AltaVista) . 

• The second field after the page titles is the 
time since the page was last updated (e.g. 5m=5 months, 
ly=l year) . 

• The third field after the page titles is the 

size of the page. 

• The context option selects the number of 
characters to display either side of the query terms. 

• The timeout option is the maximum time to 
download each individual page. 

• Searching in 11 Press" is useful for higher 
precision with current news topics. 

• Image option: remove images from the pages 
when viewed (for faster viewing) . 

• When viewing a filtered page, clicking on a 
query term jumps to the next occurrence of that term. 
Clicking on the last occurrence of a term jumps back to 
the first occurrence. 

• You can use "-term" to exclude a term. 

• You can search for links to a specific page, 
e.g. link : www. neci .nj .nec.com/homepages/giles. Self 
links are excluded. 



• When in doubt use lower case. 

• This met a engine makes more than three times 
as many documents available as a single search engine. 
Constraining your search can help, e.g. if you want to 

5 know what NASDAQ stands for, searching for "NASDAQ stands 

for" rather than "NASDAQ" can find your answer faster 
although the information may also be expressed in 
alternative ways. 

• Clicking on the search engine links in the 
10 "Searching for:" line will show the search engine 

response to the current query. 

• You can search for images by selecting the 
% "images" button, e.g. "red rose". 

fi • The bar to the left of the titles is longer 

ti$ when the query terms are closer together in the document. 

2 • The query term links in the "Searching for" 

+ line lead to the Webster dictionary definitions. 

J* • if you select Tracking: Yes, then your query 

H* will be tracked and new hits will be displayed on your 

If) customized home page similar to the "recent articles 

O about NEC Research" . 

9 • Select Cluster: Yes to cluster the documents 

and identify common themes . 

• You can filter images using a neural network 
25 prediction of whether each image is a photo or a graphic 

using the Images: option. 

• A listing of pages ranked by term proximity 
is shown after all of the documents have been retrieved. 

30 Tracking Queries and URLs 
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Services such as the Informant (The Informant, 
1997) track the response of Web search engines to 
queries, and inform users when new documents are found. 
The meta search engine of this invention supports this 
5 function. Tracking is initiated for a query by selecting 

the Track option when performing the query, A daemon 
then repeats the query periodically, storing new 
documents along with the time they were found. New 
documents are presented to the user on the home page of 
10 the search engine, as shown at 102 in figure 27. The 

engine does not currently inform users if the documents 
matching queries have changed, although this could be 
% added . 

m The meta search engine of this invention also 

^5 supports tracking URLs. Tracking is initiated by 

Jt clicking the [Track page] link when viewing one of the 

*p pages from the search engine results. Alternatively, 

y tracking may be initiated for an arbitrary URL using the 

H* options page. A daemon identifies updates to the pages 

being tracked, and shows a list of modified pages to the 
p user on the home page, as in figure 27. The [Page] link 

9 displays the page being tracked and inserts a header at 

the top showing which lines have been added or modified 
since the last time the user viewed the page (e.g. see 
25 figure 28) . 

Estimating the Coverage of Search Engines 
and the Size of the Web 

As the World Wide Web continues to expand, it 
30 is becoming an increasingly important resource for 
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scientists. Immediate access to all scientific 
literature has long been a dream of scientists, and the 
Web search engines have made a large and growing body of 
scientific literature and other information resources 
5 easily accessible. The major Web search engines are 

commonly believed to index a large proportion of the Web. 
Important questions which impact the choice of search 
methodology include: What fraction of the Web do the 
search engines index? Which search engine is the most 
10 comprehensive? How up to date are the search engine 

databases? 

A number of search engine comparisons are 
% available. Typically, these involve running a set of 

01 queries on a number of search engines and reporting the 

~%5 number of results returned by each engine. Results of 

yy these comparisons are of limited value because search 

+: engines can return documents which do not contain the 

s " query terms. This may be due to (a) the information 

^ retrieval technology used by the engine, e.g. Excite uses 

3>0 tt concept -based clustering" and Infoseek uses morphology - 

these engines can return documents with related words, 
% (b) documents may no longer exist - an engine which never 

deletes invalid documents would be at an advantage, and 
(c) documents may still exist but may have changed and no 
25 longer contain the query terms. 

Selberg and Etzioni (Selberg, E. and Etzioni, 
O. (1995) , Multi-service search and comparison using the 
MetaCrawler, in 'Proceedings of the 1995 World Wide Web 
Conference 7 ' . ) have presented results based on the usage 
30 logs of the MecaCrawler meta search service (due to 



-30- 



substantial changes in the search engine services and the 
Web, it is expected that their results would be 
significantly different if repeated now) . These results 
considered the following engines: Lycos, WebCrawler, 
InfoSeek, Galaxy, Open Text, and Yahoo. Selberg and 
Etzioni's results are informative but limited for several 
reasons . 

First, they present the "market share" of each 
engine which is the percentage of documents that 'users 
follow that "originated from each of the search engines . 
These results are limited for a number of reasons, 
including (a) relevance is difficult to determine without 
viewing the pages, and (b) presentation order affects 
user relevance judgements (Eisenberg, M. and Barry, C. 
(1986), Order effects: A preliminary study of the 
possible influence of presentation order on user 
judgements of document relevance, in "Proceedings of the 
49th Annual Meeting of the American Society for 
Information Science", Vol. 23, pp. 80-86). 

The results considered by Selberg and Etzioni 
are also limited because they present results on the 
percentage of unique references returned and the coverage 
of each engine. Their results suggest that each engine 
covers only a fraction of the Web, however their results 
are limited because (a) as above, engines can return 
documents which do not contain the query terms - engines 
which return documents with related words or invalid 
documents can result in significantly different results, 
and (b) search engines return documents in different 
orders, meaning that all documents need to be retrieved 



for a valid comparison, e.g. two search engines may index 
exactly the same set of documents yet return a different 
set as the first x. 

In addition, Selberg and Etzioni find that the 
percentage of invalid links was 15%. They do not break 
this down by search engine. Selberg and Etzioni do point 
out limitations in their study (which is just a small 
part of a larger paper on the very successful MetaCrawler 
service) . 

AltaVista and Infoseek have recently confirmed 
that they do not provide comprehensive coverage on the 
Web (Brake, D. (1997), "Lost in cyberspace" , New 
Scientist 154(2088), 12-13.) Discussed below are 
estimates on how much they do cover. 

We have produced statistics on the coverage of 
the major Web search engines, the size of the Web, and 
the recency of the search engine databases. Only the 6 
current major full -text search engines are considered 
herein (in alphabetical order) : AltaVista, Excite, 
HotBot, Infoseek, Lycos, and Northern Light. A common 
perception is that these engines index roughly the same 
documents, and that they index a relatively large portion 
of the Web. 

We first compare the number of documents 
returned when using different combinations of 1 to 6 
search engines. Our overall methodology is to retrieve 
the list of matching documents from all engines and then 
retrieve all of the documents for analysis. Two 
important constraints were used. 



The first constraint was that the entire list 
of documents matching the query must have been retrieved 
for all of the search engines in order for a query to be 
included in the study. This constraint is important 
5 because the order in which the engines rank documents 

varies between engines. Consider a query which resulted 
in greater than 1,000 documents from each engine. If we 
only compared the first 200 documents from each engine we 
may find many unique URLs. However, we would not be able 
10 to determine- if the engines were indexing unique URLs, or 

if they were indexing the same URLs but returning a 
O different subset as the first 200 documents. 

m The second constraint was that for all of the 

IB documents that each engine lists as matching the query, 

j|5 we attempted to download the full text of the 

JE corresponding URL. Only documents which could be 

X J downloaded and which actually contain the query terms are 

L counted. This is important because (a) some engines can 

O return documents which they believe are relevant but do 

50 not contain the query terms (e.g. Excite uses "concept- 

O based clustering" and may consider related words, and 

u Infoseek uses morphology) , and (b) each search engine 

contains a number of invalid links, and the percentage of 
invalid links varies between the search engines (engines 
25 which do not delete invalid links would be at an 

advantage) . 

Other details important to the analysis are: 
1. Duplicates are removed when considering the 
total number of documents returned by one engine or by a 
30 combination of engines, including detection of identical 
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pages with different URLs . URLs are normalized by 
a) removing any * index . html " suffix or trailing V", b) 
removing a port 80 designation (the default) , c) removing 
the first segment of the domain name for URLs with a 
5 directory depth greater than l(in order to account for 

machine aliases), and d) unescaping any "escaped" 
characters (e.g. %7E in a URL is equivalent to the tilde 
character) . 

2. We .consider only lowercase queries because 
10 different engines treat capitalized queries differently 

(e.g. AltaVista returns only capitalized results for 

capitalized queries) . 
if 3. We used an individual page timeout of 60 

yl seconds. Pages which timed out were not included in the 

: Tl5 analysis . 

yg 4. We use a fixed maximum of 700 documents per 

«P query (from all engines combined after the removal of 

duplicates)- queries returning more documents were not 
h- included. The search engines typically impose a maximum 

3b number of documents which can be retrieved (current 

!Z limits are AltaVista 20 0, Infoseek 500, HotBot 1,000, 

C Excite 1,000, Lycos 1,000, and Northern Light > 10,000) 

and we checked to ensure that these limits were not 
exceeded (using this constraint no query returned more 
25 than the maximum from each engine, notably no query 

returned more than 200 documents from AltaVista) . 

5. We only counted documents which contained 
the exact query terms, i.e. the word "crystals" in a 
document would not match a query term of "crystal" - the 
30 non-plural form of the word would have to exist in the 
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document in order for the document to be counted as 
matching the query. This is necessary because different 
engines use different morphology rules. 

6. HotBot and AltaVista can identify alternate 
5 pages with the same information on the Web. These 

alternate pages are included in the statistics (as they 
are for the engines which do not identify alternate pages 
with the same data) . 

7. The "special collection" (premier documents 
10 not part of "the publicly indexable Web) of Northern Light 

was not used. 

Over a period of time, we have collected 500 
J queries which satisfy the constraints. For the results 

CH presented herein, we performed the 5 00 queries during the 

%5 period 8/23/97 to 8/24/97. We manually checked that all 

yQ results were retrieved and parsed correctly from each 

f: engine before and after the tests because the engines 

s " periodically change their formats for listing documents 

and/or requesting the next page of documents (we also use 
2>0 automatic methods designed to detect temporary failures 

O and changes in the search engine response formats) . 

:j Figure 2 9 shows the fraction of the total 

number of documents from the 6 engines which were 
retrieved by each individual engine. Table 1 below shows 
25 these results along with the 95% confidence interval. 

HotBot is the most comprehensive in this comparison. 
These results are specific to the particular queries 
performed and the state of the engine databases at the 
time they were performed. Also, the results may be 
30 partly due to different indexing rather than different 
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databases sizes - different engines may not index 
identical words for the same documents (for example, the 
engines typically impose a maximum file size and 
effectively truncate oversized documents) . 



TABLE 1 



Search 
Engine 


HotBot 


Excite 


Northern 
Light 


AltaVista 


infoseek 


Lycos 


Coverage 
WRT 6 
Engines 


39.2% 


31.1% 


30.4% 


29.2% 


17.9% 


12.2% 


95% 

confidence 
interval 


+/-1.4% 


+/-1.2% 


+/-1.3% 


+/-1.2% 


+/-!.!% 


+/-!.!% 



y3 Figure 3 0 shows the average fraction of 

j: documents retrieved by 1 to S search engines normalized 

s 20 by the number retrieved from all six engines. For 1 to 5 

!f engines, the average is over all combinations of the 

J2 engines, which is averaged for each query and then 

O averaged over queries. Using the assumption that the 

t£ coverage increases logarithmically with the number of 

25 search engines, and that, in the limit, an infinite 

number of search engines would cover the entire Web, f (x) 
= b( 1 - l/exp(ax) ) , where a and b are constants and x 
is the number of search engines, was fit to the data 
(using Levenberg-Marquardt minimization (Fletcher, R. 
30 (1987) , Practical Methods of Optimization, Second 

Edition, John Wiley & Sons) with the default parameters 
in the program gnuplot) and plotted on figure 30. This 



-36- 



is equivalent to the assumption that each engine covers a 
certain fixed percentage of the Web, and each engine's 
sample of the Web is drawn independently from ail Web 
pages (Ci = + c^l-c^J, i = 2...X2 where c ± is the 
coverage of i engines and c ± is the coverage of one 
engine) * 

There are a number of important biases which 
should be considered. Search engines typically do not 
consider indexing documents which are hidden behind 
search forms", and documents where the engines are 
excluded by the robots exclusion standard, or by 
authentication requirements. Therefore, we expect the 
true size of the Web to be much larger than estimated 
here. However search engines are unlikely to start 
indexing these documents, and it is therefore of interest 
to estimate the size of the Web that they do consider 
indexing (hereafter referred to as the "indexable Web" ) , 
and the relative comprehensiveness of the engines. 

The logarithmic extrapolation above is not 
accurate for determining the size of the indexable Web 
because (a) the amount of the Web indexed by each engine 
varies significantly between the engines, and (b) the 
search engines do not sample the Web independently. All 
of the 6 search engines offer a registration function for 
users to register their pages. It is reasonable to 
assume that many users will register their pages at 
several of the engines. Therefore the pages indexed by 
each engine will be partially dependent. A second source 
of dependence between the sampling performed by each 
engine comes from the fact that search engines are 



typically biased towards indexing pages which are linked 
to in other pages, i.e. more popular pages. 

Consider the overlap between engines a and b 
in figure 31. Assuming that each engine samples the 

5 

Web independently, the quantity njxi bf where n Q is the 
number of documents returned by both engines and ^ is 
the number of documents returned by engine £>, is an 
estimate of the fraction of the indexable Web, p a , 
10 covered by engine a. Using the coverage of 6 engines as 

a reference point we can write p' a = n a /n 6/ where 
n a is the number of documents returned by engine a and n 6 
^ is the number of unique documents returned by the 

J3 combination of 6 engines. Thus, p' a is the coverage of 

Sl5 engine a with respect to the coverage of the 6 

i engines, we can write c = p' a /p a = n a ii b /n 6 n 0 . We use this 

equation to estimate the size of the Web in relation to 
2j the amount of the Web covered by the 6 engines considered 

s here. Because the size of the engines varies 

lb significantly, we consider estimating the value of c 

y* using combinations of two engines, from the smallest two 

^ to the largest two. We limit this analysis to the 245 

f queries returning > 50 documents (to avoid difficulty 

when n Q =0) . Table 2 shows the results. Values of c 
25 smaller than 1 suggest that the size of the indexable Web 

is smaller than the number of documents retrieved from 
all 6 engines. It is reasonable to expect that larger 
engines will have lower dependence because a) they can 
index more pages other than the pages which users 
30 register, and b) they can index more of the less popular 
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pages on the Web. Indeed, there is a clear trend where 
the estimated value of c increases with the larger 
engines . 



5 



10 TABLE 2 



Search 
Engines 


Lycos 
Inf oseek 


Infoseek & 
AltaVista 


AltaVista & 

Northern 

Light 


Northern 
Light & 
Excite 


Excite 
Sc HotBot 


Engine 
Sizes 


Smallest 




— > 




Largest 


Estimated 
c 


0 . 6 


0.9 


0 . 9 


1.9 


2.2 


95% 

confidence 
interval 


+/-0.04 


+/-0 . 06 


+/-0.04 


+/-0.12 


+ /-0 .17 



f Using c = 2.2, from the comparison with the 

3 largest two engines, we can estimate the fraction of the 

25 indexable Web which the engines cover: HotBot 17.8%, 

Excite 14.1%, Northern Light 13.8%, AltaVista 13.3%, 
Infoseek 8.1-%, Lycos 5.5%. These results are shown at 
120 in figure 32. The percentage of the indexable Web 
indexed by the major search engines is much lower than is 
30 commonly believed. We note that (a) it is reasonable to 

expect that the true value of c is actually larger than 



-39- 



2 . 2 due to the dependence which remains between the two 
largest engines, and (b) different results may be found 
for queries from a different class of users. 

HotBot reportedly contains 54 million pages, 
5 putting our estimate on a lower bound for the size of the 

indexable Web at approximately 300 million pages. 
Currently available estimates of the size of the Web vary 
significantly. The Internet Archive uses an estimate of 
80 million pages (excluding images, sounds, etc.) 
10 (Cunningham/ M. (1997), 'Brewster's millions', 

http : //www . irish-times . com/irish- 

times/paper/1997/0127/cmpl.html . ) Forrester Research 
% estimates that there are more than 75 million pages 

■On (Guglielmo, C. (1997), "Mr .Kurnit' s neighborhood 7 , Upside 

*35 September.) AltaVista now estimates that there the Web 

yg contains 100 to 150 million pages (Brake, D. (1997), 

;F 'Lost in cyberspace', New Scientist 154(2088), 12-13). 

A simple analysis of page retrieval times leads 
M* to some interesting conclusions. Table 3 below shows the 

£j0 median time for each of the six major search engines to 

C2 respond, along with the median time for the first of the 

2 six engines to respond when queries are made 

simultaneously to all engines (as happens in the meta 

engine) . 
25 Table 3 



Search Engine 


Median Time for response 
( seconds ) 


AltaVista 


0.9 


Inf oseek 


1.3 
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HotBot 


2.6 


Excite 


5.2 


Lycos 


2.8 


Northern Light 


7.5 


All engines 


2.7 


First of 6 engines 


0.8 


First result from the meta 
search engine of this 
invention 


1.3 



10 Histograms of the response times for these 

engines and the first of 6 engines are shown in figures 
g 33 and 34, and the median times are shown in figure 35. 

S Figure 3 6 shows the median time for the first of n 

fj; engines to respond. These results are from September 

435 1997, and we note that the relative speed of the search 

engines varies over time. 
\| Looking now at the time to download arbitrary 

: Web pages, figure 37 shows a histogram of the response 

H time. Figure 3 8 shows the median time for the first of n 

H20 engines to respond. We can estimate the time for the 

^ meta engine to display the first result, which we create 

Q by sampling from the distributions for the first of 6 

search engines (the meta engine actually uses more than 6 
search engines but we concentrate on the major Web 
25 engines here-} , and the first of 10 Web pages (the actual 

number depends on the number returned by the first engine 
to respond) , adding these together, and averaging over 
10,000 trials. 
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Figure 39 shows a histogram of this 
distribution. The median of the distribution is 1.3 
seconds (compared to 2.7 seconds for the median response 
time of a search engine even without downloading any 
actual pages) . For comparison, the average time 
MetaCrawler takes to return results is 25,7 seconds 
(without page verification) or 13 9.3 seconds (with page 
verification) (Selberg, E. and Etziono 7 O. (1995) , Multi- 
service search and comparison using the MetaCrawler, in 
'Proceedings" of the 1995 World Wide Web Conference') (the 
underlying search engines and/or the Web appear to be 
significantly faster than they were when Selberg and 
Etzioni performed their experiment) . 

Therefore, on average we find that the parallel 
architecture of the meta engine of this invention allows 
it to find, download and analyze the first page faster 
than the standard search engines can produce a result 
although the standard engines do not download and analyze 
the pages . Note that the results in this section are 
specific to the particular queries performed (speed as a 
function of the query is different for each engine) and 
the network conditions under which they were performed. 
These factors may bias the results towards certain 
engines. The non-stationarity of Web access times is not 
considered here, e.g. the speed of the engines varies 
significantly over time (short term variations may be due 
to network or machine problems and user load, long term 
variations may be due to modifications in the search 
engine software, the search engine hardware resources, or 
relevant network connections) . 
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The meta search engine of this invention 
demonstrates that real-time analysis of documents 
returned from Web search engines is feasible. In fact, 
calling the Web search engines and downloading Web pages 
5 in parallel allows the meta search engine of this 

invention to, on average, display the first result 
quicker than using a standard search engine. 

User feedback indicates that the display of 
real-time local context around query terms, and the 
10 highlighting of query terms in the documents when viewed, 

significantly improves the efficiency of searching the 
Web. 

Our experiments indicate that an upper bound on 
the coverage of the major search engines varies from 6% 

15 (Lycos) to 18% (HotBot) of the indexable Web. Combining 

the results of six engines returns more than 3.5 times as 
many documents when compared to using only one engine. 
By analyzing the overlap between search engines, we 
estimate that an approximate lower bound on the size of 

20 the indexable Web is 300 million pages. The percentage 

of invalid links returned by the major engines varies 
from 3% to 7%. Our results provide an indication of the 
relative coverage of the major Web search engines, and 
confirm that, as indicated by Selberg and Etzioni, the 

25 coverage of any one search engine is significantly 

limited. 

While it is apparent that the invention herein 
disclosed is well calculated to fulfill the objects 
previously stated, it will be appreciated that numerous 
30 modifications and embodiments may be devised by those 
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skilled in the art, and it is intended that the appended 
claims cover all such modifications and embodiments as 
fall within the true spirit and scope of the present 
invention. 
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What is claimed is: 

1. A computer- implemented meta search engine method, 
comprising the steps of: 

forwarding a query to a plurality of third 
party search engines; 

parsing the responses from the third party 
search engines in order to extract information regarding 
the documents matching the query; 

downloading the full text of the documents 
matching the" query; 

locating query terms in the documents and 
extracting text surrounding the query terms; and 

displaying the text surrounding the query 

terms . 

2. A method according to Claim 1, further including the 
step of progressively displaying the text surrounding the 
query terms as the documents are retrieved. 

3. A method according to Claim 1, further including the 
step of filtering the context strings in order to improve 
readability by removing redundant whitespace, repeated 
characters, HTML comments and tags, and special 
characters . 

4. A method according to Claim 1, further including the 
step of identifying and filtering pages which no longer 
contain the query terms. 



5. A method according to Claim 1, further including the 
step of clustering the documents based on analysis of the 
full text of each document and identification of co- 
occurring phrases and words, and conjunctions thereof. 

6. A method according to Claim 1, further including the 
steps of storing the documents matching a query so that a 
query can be repeated and only showing documents which 
are new or have been modified since the last query or a 
given time. 

7. A method according to Claim 1, further including the 
step of filtering the actual documents when viewed in 
full in order to (a) highlight the query terms, and (b) 
insert quick jump links so the user can quickly jump to 
the query term of interest . 

8. A method according to Claim 1, further including the 
steps of creating and using a database of meta- 
information regarding query terms, e.g. storing a list of 
movie titles, recognizing when the user enters a query 
containing a movie title, and taking a special action 
such as referring the user to the review of the movie at 
a specific movie review site. 

9. A method according to Claim 1, further including the 
step of storing and using information regarding the 
particular documents requested by a user in response to a 
query, e.g. remembering the most commonly requested 
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document for a given query and presenting this document 
first in response to the same query in the future. 



10. A method according to Claim 1, further including the 
5 steps of analyzing the number of documents which would 

have been found as a function of the number of third 
party search engines queried, and computing the estimated 
size of the third party search engines and the estimated 
size of the document base which the third party search 
10 engines index. 

11. A method according to Claim 1, further including the 
% step of scheduling regular searches, whereby the user is 
:?a informed of either new or modified documents since the 

^5 previous search . 

Hp 12. A method according to Claim 1, further including the 

J" step of using a more advanced detection of duplicate 

1^ documents by identifying duplicate context even when 

{|0 documents may have different headers or footers. 

2 13. A method according to Claim 1, further including the 

step of caching the full documents in order to improve 
access speed. 

25 

14. A method according to Claim 1, further including the 
step of using context sensitive suggestions based on the 
query entered, e.g. providing suggestions regarding how 
to search for a name when the query contains a single 
30 character that could represent an initial. 
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15. A method according to Claim 1, further including the 
step of using a proximity based ranking scheme to re-rank 
documents according to the number of and proximity 
between query terms. 

16. A computer- implemented meta search engine method, 
comprising the steps of: 

forwarding a query to a third party search 

engine ; 

parsing the responses from the third party 
search engine in order to extract information regarding 
the documents matching the query; 

downloading the full text of the documents 

matching the query; 

locating query terms in the documents and 
extracting text surrounding the query terms; and 

displaying the text surrounding the query 

terms . 

17. A method according to Claim 16, further including 
the step of progressively displaying the text surrounding 
the query terms as the documents are retrieved. 

18. A method according to Claim 16, further including 
the step of filtering the context strings in order to 
improve readability by removing redundant whitespace, 
repeated characters, HTML comments and tags, and special 
characters . 
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19. A method according to Claim 16, further including 
the step of identifying and filtering pages which no 
longer contain the query terms. 

20. A method according to Claim 16, further including 
the step of clustering the documents based on analysis of 
the full text of each document and identification of co- 
occurring phrases and words, and conjunctions thereof. 

21. A metho'd according to Claim 16, further including 
the steps of storing the documents matching a query so 
that a query can be repeated and only showing documents 
which are new or have been modified since the last query 
or a given time. 

22. A method according to Claim 16, further including 
the step of filtering the actual documents when viewed in 
full in order to (a) highlight the query terms, and (b) 
insert quick jump links so the user can quickly jump to 
the query term of interest. 

23. A method according to Claim 16, further including 
the steps of creating and using a database of meta- 
information regarding query terms, e.g. storing a list of 
movie titles, recognizing when the user enters a query 
containing a movie title, and taking a special action 
such as referring the user to the review of the movie at 
a specific movie review site. 
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24. A method according to Claim 16, further including 
the step of storing and using information regarding the 
particular documents requested by a user in response to a 
query, e.g. remembering the most commonly requested 
document for a given query and presenting this document 
first in response to the same query in the future. 

25. A method according to Claim 16, further including 
the step of scheduling regular searches, whereby the user 
is inf ormed "of either new or modified documents since the 
previous search. 

26. A method according to Claim 16, further including 
the step of using a more advanced detection of duplicate 
documents by identifying duplicate context even when 
documents may have different headers or footers. 

27. A method according to Claim 16, further including 
the step of caching the full documents in order to 
improve access speed. 

28. A method according to Claim 16, further including 
the step of using context sensitive suggestions based on 
the query entered, e.g. providing suggestions regarding 
how to search for a name when the query contains a single 
character that could represent an initial. 

29. A method according to Claim 16, further including 
the step of using a proximity based ranking scheme to re- 
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rank documents according to the number of and proximity 
between query terms . 

30. A computer- implemented keyword based image search 
engine method, comprising the steps of: 

forwarding a query to a plurality of third 
party image search engines; 

parsing the responses from the third party 
search engines in order to extract information regarding 
the images matching the query; 

downloading the images matching the query; and 

displaying thumbnails of the images to the 

user. 

31. A method according to claim 30, further including 
the step of user selectable filtering of the images based 
on size, color, or semantic attributes of the images. 

32. A method according to claim 30, further including 
the step of identifying and filtering commonly used 
images on the Web such as the Netscape Now image and 
horizontal bars used to separate sections of a document. 

33. A method according to claim 30, further including 
the step of identifying and filtering similar images. 

34. A method according to claim 30, further including 
the steps of identifying the type of an image, e.g. 
photograph, line drawing, logo, map, cartoon, portrait, 



button, chart, or astronomical pictures, and filtering 
based on the image type. 



35. A method according to claim 30, further including 

5 the steps of storing the images matching a query so that 

a query can be repeated, and only showing new images. 

36. A method according to claim 30, further including 
the step of storing the meta- information (e.g. type of 

10 image) so that images may be filtered using the meta- 

information without downloading the image again for new 
queries. 

ffl 3 7. A method according to claim 30, further including 

a jS the steps of displaying the full image along with the 

ill document referring to it if possible, and highlighting of 

^ query terms in the document . 

frf 3 8. A computer- implemented keyword based image search 

2p engine method, comprising the steps of: 

O forwarding the query to a plurality of third 

5? party text search engines; 

parsing the responses from the third party 
search engines in order to extract information regarding 
25 the documents matching the query; 

downloading the documents matching the query; 
analyzing the documents and locating images 
which may match the user query based on the proximity of 
query terms to image tags or references; 
30 downloading the images; and 
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displaying thumbnails of the images to the 

user. 

39. A method according to claim 38, further including 
the step of user selectable filtering of the images based 
on size, color, or semantic attributes of the images. 

40. A method according to claim 38, further including 
the step of identifying and filtering commonly used 
images on the Web such as the Netscape Now image and 
horizontal bars used to separate sections of a document. 

41. A method according to claim 38, further including 
the step of identifying and filtering similar images. 

42. A method according to claim 38, further including 
the steps of identifying the type of an image, e.g. 
photograph, line drawing, logo, map, cartoon, portrait, 
button, chart, or astronomical pictures, and filtering 
based on the image type. 

43. A method according to claim 38, further including 
the steps of storing the images matching a query so that 
a query can be repeated, and only showing new images. 

44. A method according to claim 38, further including 
the step of storing the met a- information (e.g. type of 
image) so that images may be filtered using the meta- 
information without downloading the image again for new 
queries . 



45. A method according to claim 38, further including 
the steps of displaying the full image along with the 
document referring to it if possible, and highlighting of 
query terms in the document. 

46. A computer- implemented meta search engine 
comprising: 

means for forwarding a query to a plurality of 
third party search engines; 

means for parsing the responses from the third 
party search engines in order to extract information 
regarding the documents matching the query; 

means for downloading the full text of the 
documents matching the query; 

means for locating query terms in the documents 
and extracting text surrounding the query terms; and 

means for displaying the text surrounding the 
query terms . 

47. A meta search engine according to Claim 46, further 
including means for the progressive display of the text 
surrounding the query terms as the documents are 
retrieved. 

48. A meta search engine according to Claim 46, further 
including means for the filtering of the context strings 
in order to improve readability by removing redundant 
whitespace, repeated characters, HTML comments and tags, 
and special characters . 
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49. A meta search engine according to Claim 46, further 
including means for the identification and filtering of 
pages which no longer contain the query terms . 



5 50. A meta search engine according to Claim 46, further 

including a mechanism for clustering the documents based 
on analysis of the full text of each document and 
identification of co-occurring phrases and words, and 
con j unctions thereof . 

10 

51. A meta search engine according to Claim 46, further 
including a mechanism for storing the documents matching 
if a query so that a query can be repeated and for only 

Sri showing documents which are new or have been modified 

^5 since the last query or a given date. 

+; 52. A computer- implemented meta search engine 

comprising: 

IM* means for forwarding a query to a 

30 third party search engine; 

p means for parsing the responses from the third 

y party search engine in order to extract information 

regarding the documents matching the query; 

means for downloading the full text of the 
25 documents matching the query; 

means for locating query terms in the documents 
and extracting text surrounding the query terms; and 

means for displaying the text surrounding the 
query terms . 

30 
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53. A meta search engine according to Claim 52, further 
including means for the progressive display of the text 
surrounding the query terms as the documents are 
retrieved. 

54. A meta search engine according to Claim 52, further 
including means for the filtering of the context strings 
in order to improve readability by removing redundant 
whitespace, repeated characters, HTML comments and tags, 
and special characters. 

55. A meta search engine according to Claim 52, further 
including means for the identification and filtering of 
pages which no longer contain the query terms. 

56. A meta search engine according to Claim 52, further 
including a mechanism for clustering the documents based 
on analysis of the full text of each document and 
identification of co-occurring phrases and words, and 
conjunctions thereof . 

57. A meta search engine according to Claim 52, further 
including a mechanism for storing the documents matching 
a query so that a query can be repeated and for only 
showing documents which are new or have been modified 
since the last query or a given date. 

58. A computer- implemented keyword based image search 
engine system, comprising: 
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means for forwarding a query to a number of 
third party image search engines; 

means for parsing the responses from the third 
party search engines in order to extract information 
regarding the images matching the query; 

means for downloading the images matching the 

query; and 

means for displaying thumbnails of the images 
to the user. 

59. A system according to claim 58, further including 
means for selectable filtering of the images based on 
size, color, or semantic attributes of the images. 

60. A system according to claim 58, further including 
means for identifying and filtering commonly used images 
on the Web such as the Netscape Now image and horizontal 
bars used to separate sections of a document. 

61. A system according to claim 58, further including 
means for identifying and filtering similar images. 

62. A system according to claim 58, further including 
means for identifying the type of an image, e.g. 
photograph, line drawing, logo, map, cartoon, portrait, 
button, chart, or astronomical pictures, and filtering 
based on the image type. 



63. A system according to claim 58, further including 
means for storing the images matching a query so that a 
query can be repeated, and only new images are shown. 

64. A system according to claim 58, further including 
means for storing the meta-inf ormation (e.g. type of 
image) so that images may be filtered using the meta- 
information without downloading the image again for new 
queries . 

65. A system according to claim 58, further including 
means for displaying the full image along with the 
document referring to it if possible, and means for 
highlighting of query terms in the document. 

66. A computer- implemented keyword based image search 
engine , comprising : 

means for forwarding the query to a plurality 
of third party text search engines; 

means for parsing the responses from the third 
party search engines in order to extract information 
regarding the documents matching the query; 

means for downloading the documents matching 

the query; 

means for analyzing the documents and locating 
images which" may match the user query based on the 
proximity of query terms to image tags or references ; 
means for downloading the images; and 
means for displaying thumbnails of the images 
to the user. 
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67. A system according to claim 66, further including 
means for selectable filtering of the images based on 
size, color, or semantic attributes of the images, 

68. A system according to claim 66, further including 
means for identifying and filtering commonly used images 
on the Web such as the Netscape Now image and horizontal 
bars used to separate sections of a document. 

69. A system according to claim 66, further including 
means for id.entifying and filtering similar images. 

70. A system according to claim 66, further including 
means for identifying the type of an image, e.g. 
photograph, line drawing, logo, map, cartoon, portrait, 
button, chart, or astronomical pictures, and filtering 
based on the image type. 

71. A system according to claim 66, further including 
means for storing the images matching a query so that a 
query can be repeated, and only new images are shown. 

72. A system according to claim 66, further including 
means for storing the meta- information (e.g. type of 
image) so that images may be filtered using the meta- 
information without downloading the image again for new 
queries. 

73. A system according to claim 66, further including 
means for displaying the full image along with the 
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document referring to it if possible, and means for 
highlighting of query terms in the document . 

74. A computer- implemented method for estimating the 
relative coverage of third-party search engines which 
comprises the steps of: 

forwarding a set of queries to two third-party 
search engines; 

retrieving the full list of results from each 
search engine; 

retrieving the text of all pages listed by the 
search engines; 

filtering out pages which are unavailable or no 
longer match the query; 

and 

comparing the number of remaining pages from 
each engine. 

75. A computer- implemented method for information 
retrieval which comprises the steps of : 

recognizing a query in the form of a question; 

transforming the question into a set of one or 
more specific forms in 

which the answer to the question might be 
exp r e s s ed ; and 

searching for the transformed query. 

76. The method according to claim 75 , wherein the 
specific expressive forms for each type of question are 
manually written. 
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77. The method according to claim 75, wherein the 
specific expressive forms for each type of question are 
learnt by analyzing the context of query terms in the 
documents which users select from the search method 
5 comprising the steps of: 

forwarding a query to a plurality of third 
party search engines; 

parsing the responses from the third party 
search engines in order to extract information regarding 
10 the documents matching the query; 

downloading the full text of the documents 
matching the query; 

locating query terms in the documents and 
extracting text surrounding the query terms; 
15 displaying the text surrounding the query 

terms ; and 

identifying common forms of the context. 



20 78. A computer- implemented method for query expansion 

which comprises the steps of: 

stemming the query terms; 

searching the set of query result pages for 
commonly occurring morphological variants of the query 
25 terms ; and 

using the commonly occurring morphological 
variants for query expansion. 



30 
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ABSTRACT OF THE DISCLOSURE 

A computer implemented meta search engine and 
search method. In accordance with this method, a query 
is forwarded to one or more third party search engines, 
5 and the responses from the third party search engine or 

engines are parsed in order to extract information 
regarding the documents matching the query. The full 
text of the documents matching the query are downloaded, 
and the query terms in the documents are located. The 
10 text surrounding the query terms are extracted, and that 

text is displayed. 



-62- 











Find: 


— /4 










Locality: [Any fgj^Age limit: j None H Depth: j^y ^1 Images: (a^y Jjjj 
^ J^T*" * Hits: l^-JS Context |ioo ^Cluster Jg Tracking:^No" g|' 




•>:iiHS 2 ~D - Tip: Not using iheopfo^ * / 


-' f ' >V*>- • •• <^>' ; ! : : Tomorrow at NEC! > ^ * ; < 1 / * 


'x^ijfcS ; TalkilFV£siial Homing (DWJ) - 
'lO&O^Thjs isr joint work, with Ronen Basri ami Ehu<i Rivlin We introduce a novel method for visual homing. Using this 


Multipurpose" 1 
method a robot ^ 


}ff^^l^^^\^'^is^red positions anidjc^raffi^m^-D^s^e specified by ^single unages taken from these positip 
^^ase^nr^ovlSng me epj^Iar^eo the robot and the target image. Us 

position and orientation of the camera between 
f ;;:g^ sfecfehot all of parameters can be recovered, from two^ images, we have developed sp 


is: Our method is 
mgthe ejpipolar 
the two images . 


L': i ; ^:^LexicaL Semantics and Infomatidn Retrieval Discussion Group Meeting;; Coordinator, Robert K 


xovetzu- Board 

-^^.^ 


iOH^lAepplirAltehbler; AnupiridL Ebb^eKGbttlierx Omohundrorde Ruv^c;^^- - — ~ : >\ 



Coverage WRT Estimated "Indexable Web" Size 




Figure 1. Home page of the NECI meta search engine. 
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signafy Press fStop tracking query] 

\ ; URLs being tracked: • - . 

Page http://cesdis.gsfc.nasa.gov/Iinux/drivers/vortex.html [Stop tracking URL] 

Page http://wwwJinuxhq.com/kpatch21.htnil [Stop tracking URL I 

Page ftp://ftp.cygnusxom/pub/egcs/releases/ fStop tracking URL] 

Page http://www.neci.nj.nec.comyhomepages/giIes/ [Stop tracking URL] 

Page ftp^/ftp.kernel.org/pub/linux/kernel/v2.1/ fStop tracking URL1 

. ~ Description of the options on the main page: 

Hits: Maximum number of hits to display excluding duplicates 

Context: Number of context characters to show either side of the query terms 

Cluster: Cluster documents after retrieval 

Tracking: Start tracking this query and tell me when new documents appear which match the 

Locality: Only show documents injhis domain 

Age limit: Filter out documents older than the specified age 

Depth: Only show documents with a given subdirectory depth 

Images: Filter images and only show photos or graphics 

Figure 2. Options page of the NECI meta search engine. 




H^'kSearching fofr *nec ^ "digital watermark ^ using: HotBbf =infoseeic AltaVista Excite Lycos- Northern Light { 
1 Ingemar Cox Home Page I lm 13k http://www.neci. nj\nec.corr^^ 

...http://www.necimj\nec.corn/h^ /...ar Cox Home Page Ingemar J. Cox Sr. Research 

Scientist, Computer Vision , NEC Research Institute My most recent work has focused on die development of 
statistical frame wor... /...r investigation include face recognition, and stereo correspondence problems. Address NEC 
Research Institute 4 Independence Way Princeton, NJ 08540. USA Office: 609-95 1-2722 Fax... /... My most recent 
work has focused on the development of statistical frameworks for motion analysis, digital watermarking and 
content-based image database retrieval. Other projects currently under investigation inciud... 

■0 iYECI Technical Report 95-10 1 Iy2k http://www.necr.nj,nec.coni/'tryneci-abstract-95-10,htrnl 

JJNECL Technical Report 95-10 http://ww.neci.nj.necxom/tr/neci-abstract-95-10.html«. /... NECI Technical 

Report 95-10 NECI Technical Report 95-10 NEC Research Institute, 4 Indepe... /„. Ingemar I Cox, Joe Kiiian, Tom 
Leighton, and Tafal Shamoon. December 4, 1995. We describe a digital watermarking method for use- in audio, 
image, video 1 and multimedia data. We argue that a watermark mast' be p... A..n, including dithering and recompression 
and rotation, translation, cropping and scaling. The same digital watermarking algorithm can be applied to all three > 
media under consideration with only minor modifications. ... ^ ' ^ \ ' 

MP Mass High Tech I n/a4k http^/bo^tonxom/mhWssue/vv81296/index.html 

... sites Netscape Bonds With Apple. Netsean Netscape Bonds With Apple JEM adds NEC to its online factory outlet 
Local science teachers Access Excellence A digitized play... /... Lead Stories August 12-17, 1996 Tins Week In Mass 
High Tech ARIS says it's on key with digital watermark is right on tune Info highway rest scops C ATs meow of Web 
sites Netscape Bond... 

□ BU CAS CS 585: Image and Video Computing -- Syllabus In/a 6k 
http://VAWACS.bu.edu/facuity/sclaroff/courses/cs585/syilabus.htm 

... Cox, J. Kiiian, T, Leighton, and T. Shamocm, Secure Spread Spectrum Watermarking for Multimedia , NEC 
Research Institute Technical Report 95-10. M. Kass, A. Witkin. and D. Terzopouios. Snakes: Acn... /...ions [1,2] Oct 
T 1 Edge detection C5 PI due, P2 out R 3 Digital watermark, steganography [3] T 8 Edges, contours C6 R 10 
Curve matchin... 

m SMH COMPUTERS February 20 1996 : Mark to foil Net pirates H n/a2k 
http-JAvww.smh.com.au/computers/content/960220/news6-9 

...6 : Mark ro foil Net pirates Week of February 20. 1996 Mark to foil Net pirates NEC researchers in the US have 
developed a "digital watermark" thai can be attached to multimedia info... /..Jog its owner beyond doubt m the ea»e of 
a copyright dispute. Embedded in the data itself, NEC says it is "a mathematically derived code included in the 
frequency signals of the information sen... /...re multimedia information of dubious ownership L proliferating. The code 
is invisible to users and NEC is confident it cannot be found and stripped out fty multimedia pnates. it is embedded in 
... /... February 20. 1996 \taik to foil N r et pirates NEC researchers in the US have developed a "digital watermark" 
that can be attached to multimedia information, identifying us owner beyond doubt in the ca^e of... 

[...section deleted...] 

Figure 3. First portion of a sample response of the NECI meta search engine for the query nec and "digital 
watermark". 




,99.974 » NEWSbvtes A ly 36k http://nccr.monitor.ca/monitor/issu8s/voi3iss7/newsbytes.html 
toroia licenses Apple OS Apple CEO reveals new strategy Chess: Kasparov beats Deep Blue NEC 'Digital 
atermark' technology Week of February 12 - February 16 / 1996 New Fax stand... /...-around won't come cheap 

C to run on 50th anniversary Mitel demos USB phone-computer connection Previous Page / Next Page 
eek of March 4 - March 8 /... A„ely, and productively. The products offered by the Small Business Unit will 
[reportedly let users connect to the Internet as well as create corporate intranets to link the businesses with their 
;ustomers... /...a licenses Apple OS Apple CEO reveals new strategy Chess: Kasparov beats Deep Blue NEC 
[ Digital Watermark' technology Week of February 12 - February 16 / 1996 New Fa\ standard to incorporate ... 
..he third and fourth were draws, and Kasparov won the last two games. Back to top NEC 'Digital Watermark' 
:echnology [February 20/96] NEC Corporation has developed technology that will digitally mark... /...eo, and 
aulti media data as well as text and images. Unlike conventional encryption systems, the digital watermark stays 
[embedded in the data and remains unaffected by digital-analogue conversions, image scaling o... 

299.93 1 WM tigermarktwo.html A 7ra 4k hrtp://intermedia-design.com/'tigermarktwo,htrnl 
, tigermarktwo.html NEC TigerMark DataBlade Module for Images From Informix and NEC What is 
Watermarking? ... /... NEC TigerMark DataBlade Module for Images From Informix and NEC What is 
Watermarking? With the advent of digital communication, including the I... /...ng the In tor net make it easy to 
gtransroit and redistribute perfect copies of digital data. Now with NEC is TigerMark technology, you can custom 

vatermark your images permanently and securely, without de... /...es permanently and securely, without degrading 
gthe quality of the content. NEC has developed a digital Watermarking technology that solves this problem for 
[today is content providers. NEC is TigerMark is a digit... /...erevcr your content goes, your watermark goes. too. 
4EC provides a powerful tool Digital watermark NEC has developed a digital Watermarking technology thai 
meets the needs of today is busines... /...s. too. NEC provides a powerful tool Digital watermark NEC has 
developed a digital Watermarking technology that meets the needs of today is business environment, NEC is 
digital watermark Tige... 

|Bj299.89 Wm Focus on Internet H 1m 9k hrtp://vAvw.esi.es^^ 

:|§..ERNET NEWS 24 HOURS IN CYBERSPACE PARTS OF INTERNET GO BLACK IN PROTEST OVER 
|||NEW LAW NEC DEVELOPS DIGITAL WATERMARK TECHNOLOGY INTERESTING SITES KFKI 
^RESEARCH INSTITUTE FOR MEASUR... /...arian groups and individuals. Sunda> February i 1 This is an 
"^excerpt Source: Reuters NEC DEVELOPS DIGITAL WATERMARK TECHNOLOGY PRINCETON. N.J. - 

EC says scientists at its NEC Research ... /...an excerpt Source: Reuters NEC DEVELOPS DIGITAL 
^ WATERMARK TECHNOLOGY PRINCETON, N. L - NEC says scientists at its NEC Research Institute have 
iffdeveloped a digital watermarking method for use... A..WS 24 HOURS IN CYBERSPACE PARTS OF 
^INTERNET GO BLACK IN PROTEST OVER NEW LAW NEC DEVELOPS DIGITAL WATERMARK 



^TECHNOLOGY INTERESTING SITES KFKI RESEARCH INSHTUTE FOR MEASUREMENT AND 
^COMPUTING TECHNIQ... /...and individuals. Sundav February 1 1 This is an excerpt Source: Reuters NEC 
||DEVELOPS DIGITAL WATERMARK TECHNOLOGY PRINCETON, NJ. - NEC says scientists at its NEC 
'^Research Institute have developed a ... A..K TECHNOLOGY PRINCETON. N.J. - NEC says scientist* at its NEC 
. {^Research Institute have developed a digital watermarking method for use in protecting copyrighted audio, image. 
•|||video and multimedia data. The company s... 

[...section deleted...] 



Figure 4. Second portion of a sample response of the NECI meta search engine for the query nec and "digital 
watermark", showing the pages ranked according to the relevance measure (equation 1) which includes term prox- 
imity information. 



^ ; ; / ? Only I search term was found in these documents: -\ ~ ' : /V' ' 
&fcZ3 ARIS Technologies , Homepage H 16d2k http^'/www.musiccKjexom/weicome.htrni 

I... ARIS Technologies' Homepage ARIS Technologies is an industry leader in digital watermarking. We deal 
Inclusively with protecting intellectual property such as audio, video, and multimedia... 

i Psych 267 Final Projects A n/a-Sk http://white.stanford.edu/-heeger/psych267/final.html 
..nteraetive lighting design. Proceedings Eurographics '95, p. 229-240, 1995 ( preprint ). Digital watermark. 
References: Cox, Kilian, Leighton and Shamoon, "A Secure. Imperceptible yet Perceptually sal... /..., IBM Tech, 
iReport ( preprint available ). Further links to other papers and resources on digital watermarks. Face recognition 
Iwith "eigenfaces". References: Turk and Pent! and. "Face recognition u... 

izn Dieimarc receives funding of S4.5M. N n/a 6k http://www.nlsearch.com/cgi- 
|bin/pdserv.pl?cbrecEd=YY1 99704250301 63059&ho=typhoon&po=5005 

y... Summary; First licensee is Adobe. Digimarc, the company that last year announced its Imagemarc digital 
fwatermark technology, seems to be ready to make its move in the market. ... 

gCZl Newsbvtes Daily Summary N Od 28k http://newsfaytes.mpxxom.au/newsbytes/daiiy.himl 

0.... Lernout & Hauspie (NASDAQ rLHSPF] {L&R} of Burlington, Massachusetts, and leper, Belgium. CHIPS NEC 
iDevelops World's Smallest Transistor TOKYO, JAPAN, 1997 SEP 1 1 iNE) - B\ Martyn Williams. NEC Co... /...PS 
jJNEC Develops Worlds Smallest Transistor TOKYO. JAPAN, 1997 SEP 1 1 (NB) - By Martyn Williams. NEC 
gjCorporation [TOKYO:6701] says it has developed the world's smallest operational transistor, a me... /...te length of 
■|14 nanometers (14 millionths of a millimeter). The achievement was reached as part of NEC's development of a 10 
^terabit memory chip. Intel Advances Mobile PC Platform HONG KOXG. CHINA.... 

fa A letter from the publisher TIME, December 6, 1971 L lv 3k 
: |hnpy/electron.rutgers.edu/»myadav/war71/wail/dec6a.html 

A... lough warning lo India, But the only evidence of war that night was the blackout which was quite unnecessary." 
fFrom the correspondents' files, and from background research assembled by Reporter-Res... 

Figure 5. Third portion of a sample response of the NECI meta search engine for the query nec and "digital 
watermark", showing pages where-'only one of the query terms was found. 
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w - : y ^ " "-No search terms were found in these documents: 

tide Two H ly lk http://miavx1 .muohio.edu/~whittijs/afticlervvo.htmI 
Jserdir rule failure The server was unable to resolve the requested / username reference, possible causes include: 
fUsernaine invalid Server is unable to determine username's login directory due to insufficient privilege to 

j jjonathan G.Campbell University of Ulster, N. Ireland, WWW Links I n/a lk http://w%wv,iscm. uistac.uk/-jon/lDook/ 
J|lG. Campbell's Bookmarks From 27 August 1997 this page is *permanentiy* relocated to 
Jhttp://www.infm.ulst.ac.uk/ jgc/book Updated 27 August 1997 - JG.Campbell@ulstac.uk 

j |CIQS/Comserve WWW server address has changed L 5m lk 
^httpr//cios.llc.rpi.edu:4997/rnaiiboxes/comgrads\08085 1 04. 1 1 8 

jjCIOS/Cornserve WWW server address has changed The CIOS web server address has changed. It is now 
http://www.cios.org Please note too the new email address for the Comserve email interface to 

)EF1NE IMAGE L 9m 2k hrtp:^ram.fr/doc/sic/node58.htrni 
iPEFENE IMAGE Next: DEFINE /LIKE Up: DEFINE Previous: DEFINE HEADER DEFINE IMAGE DEFINE 
ffrMAGE Varl Filel Key 1 [Var2 File2 Key2 [...]] [/GLOBAL] 

^Arizona Off-Road L 3m 3k http://wvwAazoffroad.com 

^Arizona Off-Road 1833 W. Mountain View Road Phoenix, AZ 85021 ATCs MOTORCYCLES JET SKIS GO 

IfCARTS 



^Resultats dans les cantons L 21d 5k Http://www.admin.ch/crvf^ore/va/19840226/can316-htm^ 

IjV otation no 3 1 6 - Resultats dans les cantons Tableau recapitulatif / deutsch Votation no 3 1 6 Resultats dans 1 

jjcantons Arrete federal concernant la perception d'une redevance sur le trafics des poids lourds du 24 juin 19 



§ f We Know How the Parisians Felt" L ly 6k hnp://e!ectron jutGers.edu/~myadav/war71/wali/dec27b.htm! 
||'We Know How the Parisians Felt" "We Know How the Parisians Felt" Section: Box ,Page, TIME, Dec. 27, 1971 
||Time Correspondent Dan Coggin, who covered the war from Pakistani side, was in Dacca when that city 
^surrendered. His repor 



l iThe U.S. : A Policy in Shambles L ly 6k hnp://e!ectron.rutgers.edu/^myadav/war71Mall/dec20b.htrni 
|The U.S. : A Policy in Shambles The Nixon Administration drew a fusillade of criticism last week for its policy on 
jlndia and Pakistan. Two weeks ago; when war broke out between two traditional enemies, a State Department 
Jfspokesman issued 



i gClariNet Tearsheet: Government. Business, and General News N *X5 8k hnp;//v^vvvxlarhnet''Samples/ y nb-cther.html 
^ClariNet Tearsheet: Government, Business, and General News ClariNet * ClariNet Tearsheet: Government, 
^Business, and General News ClariNet Tearsheet: Government, Business, and General News This summary of 
^computer and technology news is 



Figure 6. Fourth portion of a sample response of the NECI meta search engine for the query nec and "digital 
watermark", showing pages where none of the query terms were found. 



I Vol. L No. 19 Alv 18k http://www.med?a.sbexposxom/BULLyBUL0119.HTM 
I Alternate H ly 18k http://vww.s8yboldfeport.com/BULbBUL0119.HTM 
|...ore than five niillion members Presstek floats additional stock Pearisetters launched in Europe NEC announces 
fdigital watermark Oracle to include software suite with Internet box Ca... A..-BT have launched Presstek's 
Pearisetters in seven European countries. Reuters reports that NEC claims to have developed a digital watermark 
system that could protect digital files... /...n members Presstek floats additional slock Pearisetters launched in 
Europe NEC announces digital watermark Oracle to include software suite with Internet bo\ Canon combines 
jdivisions within an I... /...ters in seven European countries. Reuters reports that NEC claims to have developed a 
^digital watermark system that could protect digital files, such as still images, video and audio, from unauth.J. 

I (http://www.videodiscoverv.eom/vdvweb/dvd/d vdfaq.txt) H lm 118k 
Jittp^/www.videodiscoveiy.corn/vdyweb/dvd/dvdfaq.btt 
.a on NTSC line 21. Hie digital standard (CGMS/D) is not yet finalized, but will apply to digital connections such 
; IEEE 1394/Firewire. 3) Because of the potential for perfect digital copies, paranoid... /...isplaying it. No 
^unscrambled digital output is allowed until work in progress for secure digital connections is finished. On the 
^computer side, DVD-ROM drives and video display/decoder hardware or softw... /...d a PCM audio track, t Other 
Sstream.s such as Dolby Digital audio, MPEG audio, and subpicture are not necessary for the simplest case.) Basic 
§f|DVD control codes are also needed. At the moment it's ditficuL. /...doing this, but it's possible. Hie iiiumc industry 
fjps also requesting an "embedding signalling" or "digital watermark" copy protection feature. This applies a digital 
Igjsignaiure to the audio in the form of supposedly ... 

I Hvflex Jl Launch H 3m 3k http://jpn.co.jp/jan96/jp1 4, htm I 
I Hvflex Jl Launch H 3m 3k httpj7jpn.co.jp/feb96/jp14.html 
. Hyflex Jl Launch NEC Develops Digital Watermarking Technique JPN Scientists at NEC Research Institute 
,.neh NEC Develops Digital Watermarking Technique JPN Scientists at NEC Research Institute in Princeton, 
£F, have developed a digital watermarking method that could be us... /...ary information is increasingly an issue/' 
Jsaid Tatsuo Ishigoro, associate senior vice president of NEC Corp. "„ J am convinced that our watermarking 
^technique is a solution diat will be welcomed espec... /... Hyflex Jl Launch NEC Develops Digital Watermarking 
^Technique JPN Scientists at NEC Research Institute in Princeton, NJ, have de... AJque JPN T Scientists at NEC 
IIJReseaich Institute in Princeton, NJ, have developed a digital watermarking method that could be used to protect 
J§jthe copyright of images and music on the Internet. Con... A..e is oo way to track its reproduction and therefore it 
^provides little protection against piracy. A digital watermark, however, can protect a copyright by means of an 
JffinvisibJe identification code that is permanent]}-... 

y Internet H n/a20k http://netinfo.ni/uy0296/lnternet.htmi 

|jj...tscape servers. Dit kwam o.a. door he! feit dat bepaalde optionele onderdelen zoals een database-connector duur 
||jbetaald moeten worden. Microsoft op zijn bcuri deed daar weer eon schepje bovenop door we... 

://wv w jnicrosofLconi/infosen* lutp://ww w. micros oii.com/windows http://www.netscape.com NEC ontwikkelt 
^Digital Watermark Technology NEC is In zijn computerlaboratoriums bezig met een digit... 
■.^/...rnicrosoft.conv'windows http:// www, netscape com NEC ontwikkelt Digital Watermark Technology NEC is in 
: f||zijn computerlaboratoriums bezig met een digitaal watermerk. Dit watermerk moct in do tockom... /...i.convinfoserv 
"lpittp:/7www. microsott.com/ windows http://www.netscape.com NEC ontwikkelt Digital Watermark Technology 
JgNEC is in zijn computerlaboratoriums be/ig met een Jignaal waiermerk. Dit watermerk ... 



Figure 7. Fifth portion of a sample response of the NECI meta search engine for the query nec and "digital 
watermark", showing pages which contained duplicate context strings to pages found earlier. 



Error 404 Not found - file doesn't exist or is read protected [even tried multi] Digital Image Watermarking: Main 
Project Page http://www.csugIab.OT 

Error 404 Not found Labeling Techniques for Multimedia Data: http://wv/w- 
itettudelft.ni/pda/smash/pubiio f 'benelux_cr.htmI 

Error 404 Not found Labeling Techniques for Multimedia Data: http:/Avww- 
it.et.tudeift.nl/pda/smash/public/benlx96/benelux_crhtmi 

Error 404 Not Found Artisoft Inc. - Industry Awards and Recognition http://artisoft.com/rnain/overvievv/awards.html 
Error 404 File Not Found The Rutgers Review http77eiectron.rutqers.edu/-nebus/ 

This search: +nec + "digital watermark" Search engine pages: AltaVista Page 2 Page 3 Excite Page 2 
HotBot Page 2 Infoseek Lycos Northern Light Page 2 WebCrawler Yahoo 

Query expansion (adding these words to the query may help): digitally (16) digitized (16) digit (9) digitale (8) 
digitaal(8) digitization (5) digits (3) digitize (3) watermarking (463) watermarks (127) watermarked 

(50) 
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2 terms: 70 1 term: 5 0 terms: 11 duplicate context: 9 invalid link: 5 

Figure 8. Sixth (and final) portion of a sample response of the NECI meta search engine for the query nec and 
"digital watermark", showing the summary information including the number of results from each individual 
engine, etc. 



^ . 

Jump to: nec (2) digital watermark (2) ^ http:7/www.^^ fTrack 



□ NECI Technical Report 95-10 

□ NEC Research Institute, 4 Independence Way, Princeton, NJ 08540. 

Secure Spread Spectrum Watermarking for Multimedia 

Ingemar J. Cox, Joe Kilian, Tom Leighton, and Talal Shamoon. December 4, 1995. 

We describe a 01 digital watermarki ng method for use in audio, image, video and multimedia data. We argue that a 
watermark must be placed in perceptually significant components of a signal if it is to be robust to common signal 
distortions and malicious attack. However, it is well known that modification of these components can lead to 
perceptual degradation of the signal. To avoid this, we propose to insert a watermark into the spectral components of 
the data using techniques analogous to spread sprectrum communications, hiding a narrow band signal in a wideband 
channel that is the data. The watermark is difficult for an attacker to remove, even when several individuals conspire 
together with independently watermarked copies of the data. It is also robust to common signal and geometric 
distortions such as digital-to-analog and analog-to-digital conversion, resampling, and requantization, including 
dithering and recompression and rotation, translation, cropping and scaling. The same E2 digital watermarki ng 
algorithm can be applied to all three media under consideration with only minor modifications, making it especially 
appropriate for multimedia products. Retrieval of the watermark unambiguously identifies the owner, and the 
watermark can be constructed to make counterfeiting almost impossible. Experimental results are presented to support 
these claims. 



Figure 9. Sample page view for the NECI meta search engine. The query terms are highlighted and the links at the top 
jump directly to the first occurrence of the respective query terms. 




Figure 10. Simplified control flow of the meta search engine. Interactions with the page retrieval daemon are shown 
in gray. 




Fi gure 1 1. Simplified control flow for image meta search. Interactions with the page retrieval daemon are shown in 
gray. 




|§|j yf J^g^^Se^hihg: fbH koala using: * jWebSeef JCor el Lycos ^Yahoo " HotBot -AltaVista ^ V V " \ 
fe T ~T?PvT»epar to tfie left of the titles is longer when toe query terms are closer t^etherin the document 




Figure 12. First portion of a sample response of the NECI meta search engine for the query koala in the image 
databases, filtered for photos. 



This search: koala Search engine pages: AltaVista Images Corel Images HotBot Images Page 2 Page 3 Page 
4 Page 5 Page 6 Lycos Images ' Page 2 Page 3 Page 4 Page 5 WebSeer Yahoo Images 



? &tavis&M o "l>4 




^Yahoo Images "li"fti^s#l-^4 !^ r 4 ^4&\ ^>V : lJSi 1 '0? 

More, docimienis%^er£ found but the maximum number wf hhiiwasreacheiL 
Filtered due to size: 12 Filtered due to type: 21~ 

pFigure 13. Second portion of a sample response of the NECI meta search engine for the query koala in the image 
O databases, filtered for photos. 





Find: koala 




H Depth: pny Images: (Graphic^ 




Hits: ] 100 jgj Context: |ioo^' Ouster: Jno Tfeckingil No M 



%1\ Searching for: koala using: " WebSeer Corel Lycos Yahoo HotBot AltaVista- _ : 



■ Tip: You cart search for links to a specific page^e.g. Ijnlcwww.hecj Aj.nec!com/homepaVes/giles". Self links are excluded. 




This search: koala Search engine pages: AltaVista Images Corel Images HotBot Images Lvcos Images Page 2 
Page 3 Page 4 Page 5 Page 6 Page? WebSeer Yahoo Images 



~ — lEniT 




' 3L^os Imap^ II" ^l 5 

x Yahoo miages ?^ ^Yes ^ ZS4 i^^^7^ r P^^' ^-- i "i 
Total' ' r *' " ^ " ^o«\^ w ~*'a-r <<^V '- iVft". ^"*";^^'^ 
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_ ; 228^;^;;; m 

More documents were found but the ^maximum number of hits wasreached. 
Filtered due to size: 2 Filtered due to type: 6 1 

Figure 14. Sample response of the NECI meta search engine for the query koala in the image databases, filtered for 
graphics. 
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Figure 15. Clusters for the query "joydeep ghosh". 



Document :. ..by clicking on . Journal Papers: Ismail Taha and Joydeep Ghosh, VV A Hybrid Intelligent Architecture and 
Its Application to Water Reserve... /...d to Journal of Smart Engineering Systems . Ismail Taha and Joydeep Ghosh, 
"Symbolic Interpretation of Artificial Neural Networks/, submitted ... /... Austin, 1996. Conference Papers: Ismail 
Taha and Joydeep Ghosh, "Evaluation and Ordering of Rules Extracted from Feedforward Networks.... /...Also, 
Tech. Rep, TR-97-0 1- 1 06, The Computer and Vision Research Center, University of Texas, Austin, 1996. 
Conference Papers: Ismail Taha an... 

Docnmenfc ..Joyifeq> Ghosh... /... Joydeep Ghosh Joydeep Ghosh Telephone: (512) 471-8980 Fax: (512) 471-5... 
/„. Joydeep Ghosh Joydeep Ghosh Telephone: (512) 471-8980 Fax: (512) 471-5532 E-mail: ghosh@pin... /...Fax: 
(5 1 2) 471-5532 E-mail: ghosh@pine.ece.utexas.edu Address: Hie University of Texas at Austin Department of 

Electrical <& Computer Engineering-... 

Document :... Yoan Shin and Joydeep Ghosh Department of Electrical and Computer... /...Yoan Shin and Joydeep 
Ghosh Department of Electrical and Computer Engineering The University of Texas... /..An and Joydeep Ghosh 
Department of Electrical and Computer Engineering The University of Texas at Austin Austin, TX 78712 Abstract 
Tins paper introduces a nov... 
...more... 

Document :... Artificial Neural Networks Authors: Bryan W. Stiles and Joydeep Ghosh Department of Electrical and 
Computer Engineering The Unive... /...rsity of Texas at Austin Correspondence: Bryan Stiles c/o Joydeep Ghosh 
Department of Electrical and Computer Engineering The Unive... /...Phone: (512) 471-2358 Email: 
bstiles@pine.ece.utexas.edu Joydeep Ghosh Department of Electrical and Computer Engineering The Univ... /... A 
Habituation Based Mechanism for Encoding Temporal Information in Artificial Neural Networks Authors: Bryan W. 
Stiles and joydeep Ghosh Department o... /.„!: ghosh@pine.ece.utexas.edu Submit to: Applications and Science of 
Artificial Neural Networks Steven K. Rogers and Dennis W. Ruck at AeroSense *9... 

Document :... (eds.)> IEEE Press. 1995. pp 135 - 144. Bryan W. Stiles and Joydeep Ghosh, "A Habituation Based 
Mechanism for Encoding Temporal Information in Am... /...£ Proc. Vol. . Orlando, April 1995, pp. Bryan W. Stiles 
and Joydeep Ghosh, "Habituation Based Neural Classifiers for Spatio-temporal SisrnahT. Pro... /...Proc. ICASSP-95, 
Detroit. May 1 995, pp. Bryan W. Stiles and Joydeep Ghosh, "Dynamic Neural Networks for the Classification of 
Oceanographic Data",... /...Ghosh, T A Habitation Based Mechanism for Encoding Temporal Information in Artificial 
Neural Networks", (invited paper ) Proc. SPIE Conf. on Applications and Science of Aruf... /...tworks". (invited 
paper ) Proc. SPIE Conf. on Applications and Science of Artificial Neural Networks IV, SPIE Proc Vol Orlando 
April 1 995, pp. Bryan W. St... 

Document :...uth.edu Larry D. Jackei Robert E. Sehapire Y. Freund Kagan Turner and Joydeep Ghosh Shimon 
Edelman Jonathan Baxter Anders Krogh and Jesper Vcdcisby ... /...ftp from 

!, lxiv//eris.\visc[om.wc]/]nann.ac.il/pub/mam.ps.2" Kagan Turner and Joydeep Ghosh, "Theoretical Foundations of 
Linear and Order Statistics Combiners for Ne... /...When Networks Disagree: Ensemble Methods for Neural 
Networks", Chapter 10, Artificial Neural Networks for Speech and Vision, editor RJ. Mamraone Chapman-Hall 
London J 9^3 M..„ 
>?...more... 



Figure 16. The first two cluster summaries for the query "joydeep ghosh". 
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Figure 17. First part of the clusters for the query "joydeep ghosh" from Husky Search. 
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Figure 18. Second part of the clusters for the query " joydeep ghosh" from HuskySearch. 
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Figure 19. Clusters for the query "joydeep ghost" from AltaVista. 
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Figure 20. Clusters produced by the NECI meta search engine for the query "neural network" 
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Document :...FAQ - Typing Injury ... /.„ Typing Injury FAQ Home Page [TIFAQ] [General] [Keyboards] [Speech] [Mice] [Sof. FAQ - 

Typing Injury ... /... Typing Injury FAQ Home Page [TIFAQ] [General] [Keyboards] [Speech] [Mice] [Software] ... /...Injury Archive, 
sources of information for |>eopie with typing injuries, repetitive stress injuries, carpal tunnel syndrome, etc.. The TIFAQ istargeted at 
computer usen, suffering at the h<uids of their equipment. You'... 

Document :... JXM'b Ergonomics Page k a site by John Murray at University of Michigan that focuses on typing injuries, carpal tunnel 
syndrome and design concepts. Office Working Posmie* is a comrae... /...era! links to various safety oriented servcis. lists, and 
newsgroups. You do the searching. Typing Injuries is everything you ever wanted to know about typing injuries by Dan Wallach ar 
Princeto... /... Princeton. Lots of publications and link's. Everything you wanted to know and more. Typing Injury Archive is a typing 
injury library by Dan Wallach at Princeton. Here >ou will lind a well caie... A..ons and links. Everything you wanted to know and more. 
Typing Injury Archive is a typing injury library by Dan Wallach at Princeton. Here you wdl find a well categon/ed list of typing injur} 
... /... and Ergontjraica Home Paee link* ;o several ei genomic sites that focus on safety issues, such as: carpal tunnel syndrome, back 
injuries, air quality, sick building syndrom^ and iishnng. Lawrence Livermore Lab ... A..aJ technology and human factors engineering 
Mostly in-house work but an interesting site. Carpal Tunnel Syndrome is a commercial site but does have lots of references to CTS. ~ 
Sperine emphasis is cn ke> board... 

Document :... ai the keyboard. Sue ollere Time Out For Windows, an ergonomic exercise break program. Typing Injury FAQ : This is the 
home page for the Typing Injury- FAQ and Typing Injury Archive. ... A., an ergonomic e\eic^e b;eok progiam. Typing Irrary FAQ : 
This is the home page for the Typing Injury FAQ and Typing Injury Archhe NEW' University of Minnesota Office Ergonomics... 
/...arc, how one gets diem, and some guidelines for how one may help heal oneself from this devastating injury. Carpal Tunnel Syndrome 
& Repeurive Stress Computer Related Repetitive St... A.. Carpal Tunnel Syndrome & Repetitive Sn ess, Computer Related Repetitive 
Strain Injury : I hope on this page to provide a very brief imroducnon to RSI for the benefit of <>:udcms who... /...em, aid some guidelines 
for how one may help heal oneself from this devastating injury. Carpal Tunnel Syndrome & Repetitive Stress Computer Related 
Repel! in e Strain Injury • I hope on this rsge t... /...pause helps > ou awjid OOS / RSI v. .tl: Micropuuaes and Exercise Breaks. Patient's 
Guide to Carpal Tunnel Syndrome : The following documents attempt to explain what Carpal Tunnel Syndrome how it i* 
diagnosed ... 

Document :...7>j?m g Injury FAQ: General Information... A.. Typing Injury FAQ: General Information General Information [TIFAQ] 

[General] [Key Typing Injury FAQ: General Information... A.. Typing Injury FAQ: General Information General Information 

[TIFAQ] [General] [Keyboards]... A.xunym lot RSI WRL LD Work-Related L pper L:mb Disorders - yet another synonym for RSI CTS 
Carpal Tunnel Syndrome (see below) Hyperextension Marked benomg at a jo.nt. P:on„;ion Tuiamg 'he palm down... A..o\er the wribi 
and forearm, some tendeme^ and \t gets worse with repetitive activity. Carpal Tunnel Syndrome the nerves that run through your wrist 
j mo your lingers get trapped by the m fumed mu... 
...more... 



Figure 21. Clusters produced by the NECI meta search engine for the query typing and injury along with the first 
cluster summary. 



i||Searchingfon "NASDAQ stands for" " NASDAQ is an abbreviation V' r " NASDAQ means T ^sing: HotBot ;^ 
: ^$2li'l'2±L^2 ,* 2*f. Mosg 6 ^ AltaVista % xc ~ lte Lycos " Northern Light Yahoo WebCrawler . .V^iv ^ V : ^ " - 

:|!:fip: For better precision with rhuftipie terms you might like to use V to^ensure^that the results cohfaiaspecific terms (e.gT 

"i.L^fi:^: ~^_--^ c / ' J;, „ ^ ;: -"J+"Iee giles" ^pfi^^^f^i: ~ ^ 

Ref:...viation for the New York Stock Exchange AM EX is an abbreviation for the American Stock Exchange 
NASDAQ is an abbreviation for the National Association of Securities Dealers Automatic Quotation Exchange "Top 
5% of the... 

Ref:...nformation on NASDAQ and the companies traded thereon. (Incidentally, does anyone know what NASDAQ 
stands for?) NYSE All about the N ew Y ork S lock E xchange. Data mongers loo... 

Ref:...- Hie NASDAQ Last-Revised: 25 Oct 1996 From: billnianr@aol.com , jeffwben@aol.com , cml@cs.umd.edu 
NASDAQ is an abbreviation for the National Association of Securities Dealers Automated Quotation system. It is 
also common!... 

Ref:... NASDAQ Last-Revised: 25 Oct 1996 From: hilknajir@aol.com , jeffwben@aol.com , lott@invest-faq.com 
NASDAQ is an abbreviation for the National Association of Securities Dealer* Automated Quotation system. It is 
also common J... 

Ref:...hle for the operation and regulation of the NASDAQ stock market and overthecounter markets. NASDAQ 
Stands for the National Association of Securities Dealers Automated Quotation System. A nationwide computer!... 
Ref:.„sitc hide* is a value weighted index that monitors more than 2,000 stocks traded over-the-counter. NASDAQ 
stands for National Association of Securities Dealers Automated Quotations. It has been available since 1971 ... 
Ref:...as an incentive stock option under Section 422 of the Code, (k) "NASDAQ" means the National Association of 
Securities Dealers, Inc. Automated Quotation System... 

[ ...section deleted ] 

This search: "NASDAQ stands for" "NASDAQ is an abbreviation" "NASDAQ means" Search engine pages: 
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Figure 22. NECI meta search engine response for the query What does NASDAQ stand for? 



f i linfoseelr 

■ proof of intelligent life on the net** 



Ham I Ad d U RL I Free Software | Help 



Global Services 



elated Topics 



Info seek found 23,064,238 pages containing at least one of these words: what does NASDAQ stand for? 



Search Results 1-10 



| Hide Summaries I next 10 



St 

ill 



Stock Research NASDAQ - are vou a shrewd investor? FREE copy of the best 

Stock Research NASDAQ - are you a shrewd investor? FREE copy of the best Stock Research NASDAQ. Be ahead of the 
Market, Stock Research NASDAQ that always call it right. Our „. 
72% httpj'/www.updarc^ws.comykcy/uivcstrescarch/ (Size 5.3 K) 

CrberLand HO 

CyBerCorp designs and develops real-time decision support, execution and trading systems for NASDAQ stock market traders. 

Be sure to check out CyBerTrader. 

63% http://www.cyber-corp.coni/ (5izc4.0K) 

InvestOuest© AMUSEMENT & RECREATION SERVICES 

AMUSEMENT & RECREATION SERVICES. [InvestQuest® Homel Company List! Industries! Company Search] 
ALLIANCE GAMING CORP (NASDAQrALLY) ALPHA HOSPITALITY CORP (NASDAQrALHY) AMERICAN ... 
63% http^/www.investquesLcom/.htaiI/79_industryJitra (Size U.9K) 

InvestOuest® INSURANCE CARRIERS 

INSURANCE CARRIERS. [InvestQuest® Homei Company Listi Industries! Company Search] 20TH CENTURY 
INDUSTRIES (NYSErTW) ACCEL INTERNATIONAL CORP (NASDAQrACLE) ACCEPTANCE INSURANCE ... 
62% http7/www.inves^uestcom/Jionl/63_industry.htm (Size 26.8 K) 

InvestOuest® WHOLESALE TRADE— DURABLE GOODS 

WHOLESALE TRADE-DURABLE GOODS. [InvestQuest® Homel Company List] Industries! Company Search] AAR 
CORP (NYSE: AIR) ABATIX ENVIRONMENTAL CORP (NASDAQ:ABIX) ACE HARDWARE CORP ... 
62% http , 7/www.inve^^uesLcorn/.htmi/50_industryJitm (Size 24 .2K) 

Stocks hv Symbol- C 

Stocks by Symbol - C C (NYSE) Chrysler CA (TSE) Canadian Airlines CANOTC%3 AMGIS (OTQ Magisoft Software Corp 
CAWS+ (NASDAQ) CAI Wireless Systems Inc. CBMI (NASDAQ) Creative ... 
62% http://stockclubxom/stocks/synilxil-c-index.htmi (Size9.6K) 

Stocks bv Company Name - A 

A+ Communications (NASDAQrACOM) A. G. Edwards (NYSEiAGE) Abatix Environmental (NASDAQrABDQ access health 
(NASDAQ: ACCS) acclaim (NASDAQ:AKLM) Ackerley Communications (NASDAQrAK) ... 
62% http^/stocfcclubxom/stocks/nanK-a-index.htnil (Size8.5K) 



| InvestOuest® RUBBER AND MISC. PLASTICS PRODUCTS 

' RUBBER AND MISC. PLASTICS PRODUCTS. [InvestQuest® Homei Company List] IndustnesI Company Searchl 

ADVANCED MATERIALS GROUP INC (NASDAQ:ADMG) AEP INDUSTRIES INC (NASDAQ: AEPI) ... 
| 62% http://wwinvestqucsLcoixVJ]tml/30_industry.htin (Size I0.6K) 
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InvestOuest® FABRICATED METAL PRODUCTS 

FABRICATED METAL PRODUCTS- [InvestQuest® Homei Company Ustl IndustnesI Company Search] AAVID 
I THERMAL TECHNOLOGIES INC (NASDAQ:AATT) ABC RAIL PRODUCTS CORP (NASDAQ: ABCR) ABS ... 
' 62% http^Avww investquestcom/,htmi/34_iiidustry.htin (Size 13 JK) 

| I nvestQuest® ENGINEERING & MANA GEMENT SERVICES 

I ENGINEERING & MANAGEMENT SERVICES. [InvestQuest® Homei Company Listi Industries! Company Search] 
! ADVANCED DETECTORS INC (OTC Bulletin Board:3ADET) AERO SYSTEMS ENGINEERING INC ... 
! 62% http://www.invesiquest.eom/.html/87Jndusay.htni (Size 15.SK) 
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Figure 23. Response of Infoseek for the query What does NASDAQ stand for? 




;|Se^hing fon ? rainbo w is created" i'snSes a t^fabo w created rainbbw is ^>d&ed " " rainbow is made " 

: t"isL\vi-^^ffg? HotBot Northern Li dit Yahoo ^WebCrawler . _* ^ 1 J* 

lllP : ^ or .^* fe KP re ^°n A^P^i^^y?^^^ Hfelo tJse V to erfsuYVS specific terms (e.g. 

Ref:... the green flash, it helps to know how our atmosphere effects sunlight. Coincidental!}', the phenomenon 
responsible for the green flash is also the one that paints rainbows across Hawaii's sky. A rainbow is created when 
rays of sunlight enter a raindrop, bounce around inside, and exit. Light from the sun consists of a potpourri of colors 
thai are each hem hy a different amount inside a raindrop. This uncqua... 

Ref:— scapes the raindrop after it is reflected once. A part of the ray is reflected again and travels along inside the drop 
to emerge from the drop. The rainbow we normally see is called the primary rainbow and is produced by one internal 
reflection the secondary rainbow arises from two internal reflections and die rays exit the drop at an angle of 50 
degrees rather than die 42degrecs for the red primary bow. ... 

Ref:„.e rainbow we do not see die sun, and we rarely see a rainbow in winter. How do we explain this appearance of a 
bow, double bows, size of arc, and brightness of the rainbow? Answer The rainbow is produced by sunlight passing 
through a raindrop or a collection of rain drops. A typical raindrop is spherical and as a light ray strikes the surface of 
the raindrop, some light is reflected and ^ome passes ... 

R£f:...se to us. He promised that the earth will never be destroyed again by a flood. As a sign of that promise He put a 
rainbow in the sky. Whenever we see a rainbow, we can think of God's promise. The rainbow is made up of all the 
colors. Back To Index Next Page... Page 1 ... 

Ref:-»two rainbow^ die narrower male rainbow and the wider female. The male rainbow can not stop the rain by itself. 
When it is followed by the female the rain stops. Other Native Americans believe the rainbow is made from the souls 
of wild flowers that lived in the forest and lilies from the prairies. A Japanese myth tells of the first man Jsanagi and the 
first woman Lsanami who stood on (he floating bri... /...te of samara before die clear light of Nirvana or heaven. In 
Arabia the rainbow is a tapestry draped by the hands of the south wind. It is also called the cloud's bow or Allah's bow. 
In Islam the rainbow is made up of four colors red, yellow, green and blue related to the four elements. In myths of 
India the Goddess Indra not only carries a thunderbolt like the Greek God Zeus but she also carries a ... 
Ref:.... true b. fahe 13. The average speed of light is greatest m _. a. red glass, b. orange glass, c. green glass, d. blue 
glass, e. is the same in all of these. 14. The secondary rainbow is produced with an extra (choose the best answer) a. 
dispersion, b. reflection, c. refraction, d. diffraction. 15. If a person has green cones that are weak, then yellow light 
will appeal _ to t... 

Ref:...ever wonder what makes rhe color in a rainbow ? The answer is sunlight, n has all of the colors of the rainbow in 
it, but they are all mixed up together so you are not a ble to see them. The rainbow is made up of drops of water. 
When sunlight passes through a drop of water, it bends and the colors inside the light split apart and are separated so 
that we can see them When the sunlight passes through... 

[ ...section deleted... ] 



Figure 24. NECI meta search engine response for the query How is a rainbow created? 
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Havden Books: Creative Techniques 

Creative Techniques brought to you by Hayden Books Working with Layers: Creating a Rainbow Effect Art by Gary Poyssick 
Comments: This dp shows you how to use layers and Photoshop's 

64% http^/wwjncpxom/169470081705imayden/serie^ (Size4.2K) 
Rainbow Sports Networks and The Sporting News Create Alliance 

World Data Web Connect InsideMedia August 16, 1996 Rainbow Sports Networks and The Sporting News Create Alliance 

Rainbow Programming Holdings' sports networks - NewSport, Prime and ... 

64% http'7/ww.niediaccntral.com/M (Size3.9K) 

Havden Books: Creative Techniques 

Creative Techniques brought to you by Hayden Books Working with Layers: Creating a Rainbow Effect Art by Gary Poyssick 

Comments: This tip shows you how to use layers and Photoshop's ... 

64% hnp://wwwjncp.com/l 8229751 149932^ay<jenyserics/techniqucs/52^7]dcx.htmi <Stze4 2K) 

Rainbow Warriors? 

Rainbow Warriors? Hacker All the colors of the rainbow... The appropriate excerpt from the alt2600 FAQ. You are left on your 
own recognizance 

63% httpj/vrw.jhcloosxom/sjgzrncs/hxktx/chion^^ (Size 3.8K) 
Pet Loss and Rainbow Bridge 

Rainbow Bridge and Pet Loss grief pages, may post poems, photos, tributes or just stop by and be comforted. 
62% hap^/ww.prinKneLcom/~meg^c/bridgC-hmi (Size 46J2K) 

The skv's the limit: student activities 

We all see something different when we look up to the sky. The clouds often stir our imagination allowing us to see animated 

images being formed by those mysterious "puffs of ... 

62% htlp://wwjoludcHisjbnxcorii/kI2/tcacher/doudss±tml (Size9.6K) 



Asymetrix 3D F/Xtm Drag and Drop 3D for Windows screenshot Create 

high-quality, professionally rendered three-dimensional images and animations with Asymetrix 3D F/X. You can easily add 
dazzling 3-D effects and sophisticated animation to any ... 
62% http://3dsite.ccm/3dsi te/cgi/softwar^ (Size 6.9K) 

Rainbow Video Authoring Services 

CD-ROM Authoring and Web Development Services. Not satisfied with offering the best in regional video production, 
Rainbow Video offers complete CD-ROM authoring services for both the „. 
61% http^Avww^nbowvideoxoin/author.htm {Size UK) 
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Bio 

Bio [TOC Icon][Feedback Icon] photo by Wink Van Kempen Fred Stem The Rainbow Maker Fred Stern was raised in New 
York and is an acknowledged innovator m environmental art He has ... 
61% hltpj'Avww.zianctxom/rauibow/bio.htm (Size 4. IK) 

i Oregon De partment of Fish and Wildlife Weekly Fishin? Rep ort 

| Oregon Department of Fish and Wildlife. Now sorted by zones!. Updated: August 7, 1997. * Denotes scheduled stocking 

] Zones. Northwest Zone i Southwest Zone I Willamette Zone 1 ... 

i 60% http://www.dfw,state.cr.u^ (Size40.5K) 



Hide Summaries I nexllfi 



Copyright © 1 995-97 Infoseek ComoratiPn. AH rights reserved. 
Infoseek incorporates LmgmstX technology from InXight 
Disclaimer 



Figure 25. Response of Infoseek for the query How is a rainbow created? 
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Searclfirig for: " mealy machine^ fe? " mealy machine refers io % *. " mealy machine means" "mealy machine will " 
*'. V meaJy machine helps " uang: HotBot Infoseek ; AltaVista ' Excite Tjycos " Nokherri Light Y^oo^Wel^Crawler. 

W^fipf^oc better precision With^ mufeple terms you might like to use V-fe "ensure foat the results conta^ ,] 

Ref:...L such that all state memory' changes arc made with respect to the clock signal. T F A Moore machine usually 
has less states than an equivalent Mealy machine. T F A potential problem with a Mealy Machine is that the output 
changes are not synchronized with clock changes. Fill in the blanks. 10 points at 2 points per blank The canonical SOP 
form of an expression results in a level circuit. 

Ref:... input alphabet and by creating multiple input mechanisms for reading events. Second, she transition function 
must be modified so that controller tasks can be performed during state transitions. A Mealy machine is a DFA that 
defines symbols which are output during state transitions. For the current purpose, a similar mechanism is u^ed to 
perform controller tasks such as moving a robot, opening a vise of fi... 

Ref :... these general premises to the Collatz conjecture, which of ail open problems at the moment is perhaps the most 
conveniently conducive to the approach. = Generalized Sequential Machines, GSMs A Mealy machine is a Finite 
State Automaton with a single output symbol associated with each state transition (e.g. see [1. p.42]). A GSM or 
Generalized Sequential Machine is similar it is a FSA with an output strin... 

Ref :... next state which men effect the output. (State refers to all latched events and values.) Argument for Mealy is 
that the output depends on the transition, thus ignoring the buffers, the CFSM is a Mealy machine. (Will explore this 
more later.) Issues concerning compostion have not been resolved by the Polis group, tiicic i& no composition as it 
stands. Resources A Formal Methodology for Hardware/Softwar... 

Ref :... State Machines We consider two types of state machines, Moore and Mealy. A Moore machine is a Mealy 
machine whose output does not directly depend on its input. Mealy Machines A Mealy Machine is a 6-tuple M = ( S, 
D> Q> q_0> D(a,q), l(a,q) ) where S 1= 0 is a finite set oi input symbols i we will u.\e a to denote a particular input 
symbol) D != 0 is a finite set... 

Ref:...ving on state regbter flip-flops, it is still desirable to use them. This leads to alternative synchronous design 
styles for Mealy machines. Simply stated, the way to construct a synchronous Mealy machine is to break the direct 
connection between inputs and outputs by introducing storage elements. One way to do tlm is to synchronize the 
Mealy machine outputs with output rlip-flops. See Figure 3... 

Ref: ...i dons, A FSA is called non- deterministic l( these is one or move transition* from one state to another for a given 
input. A Moore machine is an FSA which associates an output with each state and a Mealy machine is an FSA which 
associates an output with each transition. The Moore and Mealy FSAs are important in applications of FSAs. 
Equivalence of deterministic and non-deterministic fsa It might seem ... 

Ref:...icle. and you, will make use of tins three-block model to describe a state machine m V'HDL using our four-step 
design procedure. Moreover, the outputs of a state machine define its type. That is, a Mealy machine is one in which 
the outputs are a function of both the inputs and the current state-variables (Figure \\ A Moore machine has outputs 
that arc a function of die staic-variahles only (Figure 2). And a VI... 

Ref :... 60 Bestudeer van module 1.1 de biz.7 t/m biz. J 3 grondig. 20 Als u met (of met meer zo gced) weet wat een 
toestandsmachine, een toestandsdiagram, een Moore- of een Meaiy-machine is, zoek dan wat u niet (meer) weet op 
in uw boek(en) over digimle techniek. 30. MaaL de oefenopriraciu van hi/ .13 10 Lees de rest van module i . 1 
o pperv I akkig ... 
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Figure 26. NECI meta search engine response for the query What is a mealy machine? 
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Figure 27. Sample home page showing new hits for a query and recently modified URLs. 
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: igure 28. Sample page view showing the text which has been added to the page since the last time it was viewed. 
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Figure 29. Coverage of each engine with respect to the combined coverage of all 6 (averaged over 500 queries). 




Figure 30. Coverage as the number of search engines is increased (averaged over 500 queries). The extrapolation is 
created using the assumption that the coverage increases logarithmically with the number of search engines. Signifi- 
cantly more documents are returned as the number of search engines is increased. 




Figure 31. In order to estimate the size of the indexable Web (the Web excluding pages not considered by the search 
engines), we compare the overlap between engines to the number of documents returned from all 6 engines combined. 




Figure 32. Coverage of each engine with respect to the estimated size of the indexable Web (the estimate is expected 
to be lower than the true value). 
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Figure 33. Histograms of the major search engine response times, continued in the next figure. 



Infoseek Response Time 



40 r 
35 - 




2 4 6 8 10 12 

Time 



Lycos Response Time 

40 r 
35 - 




Time 



Northern Light Response Time 

40 r 
35 - 




Time 



40 




35 




30 




25 




20 




15 




10 




5 




0 





L 



Response Time for the First of 6 Web Search Engines 
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Figure 34. (Continued from the previous figure) Histograms of the major search engine response times (above, in 
seconds), and a histogram of the response time for the first response when queries are made to the six engines simul- 
taneously (below, created from 10,000 samples drawn from the previous distributions). The frequency is normalized 
so that it represents the percentage of responses that fail within each section of the histogram. The last section of the 
histograms also contains all samples with longer times. 
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Figure 36. Median time for the first of n Web search engines to respond. 
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Figure 37. Response time for arbitrary Web pages. 
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Figure 38. Median time to download the first of n pages requested simultaneously. 

Response Time for the First Result from the Meta Engine 
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Figure 39. Time for the meta engine to display the first result. 
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